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Non-Asymptotic and Second-Order Achievability 
Bounds for Coding With Side-Information 

Shun Watanabe^, Shigeaki Kuzuoka*, Vincent Y. F. Tan* 



Abstract 

We present novel non-asymptotic or finite blocklength achievability bounds for three side-information problems 
in network information theory. These include (i) the Wyner-Ahlswede-Korner (WAK) problem of almost-lossless 
source coding with rate-limited side-information, (ii) the Wyner-Ziv (WZ) problem of lossy source coding with 
side-information at the decoder and (iii) the Gel'fand-Pinsker (GP) problem of channel coding with noncausal state 
information available at the encoder. The bounds are proved using ideas from channel simulation and channel 
. resolvability. Our bounds for all three problems improve on all previous non-asymptotic bounds on the error 

r T ^ probability of the WAK, WZ and GP problems-in particular those derived by Verdu. Using our novel non-asymptotic 

bounds, we recover the general formulas for the optimal rates of these side-information problems. Finally, we also 
present achievable second-order coding rates by applying the multidimensional Berry-Esseen theorem to our new 
non-asymptotic bounds. Numerical results show that the second-order coding rates obtained using our non-asymptotic 
achievability bounds are superior to those obtained using existing finite blocklength bounds. 
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Index Terms 

Source coding, channel coding, side-information, Wyner-Ahlswede-Korner, Wyner-Ziv, Gel'fand-Pinsker, finite 
blocklength, non-asymptotic, second-order coding rates 

I. Introduction 



The study of network information theory [TJ involves characterizing the optimal rate regions or capacity regions for 
• \ problems involving compression and transmission from multiple sources to multiple destinations. Apart from a few 



special channels or source models, optimal rate regions and capacity regions for many network information theory 
^T) • problems are still not known. In this paper, we revisit three coding problems whose asymptotic rate characterizations 
are well known. These include 



• The Wyner-Ahlswede-Korner (WAK) problem of almost-lossless source coding with rate-limited (aka coded) 
side-information (2), 0, 

• The Wyner-Ziv (WZ) problem of lossy source coding with side-information at the decoder [4], and 

• The Gel'fand-Pinsker (GP) problem of channel coding with noncausal state information at the encoder J5J. 

These problems fall under the class of coding problems with side-information. That is, a subset of terminals has 
access to either a correlated source or the state of the channel. In most cases, this knowledge helps to strictly 
improve the rates of compression or transmission over the case where there is no side-information. 

While the study of asymptotic characterizations of network information theory problems has been of key interest 
and importance for the past 50 years, it is important to analyze non-asymptotic (or finite blocklength) limits 
of various network information theory problems. This is because there may be hard constraints on decoding 
complexity or delay in modern, heavily-networked systems. The paper derives new non-asymptotic bounds on 
the error probability for the WAK and GP problems as well as the probability of excess distortion for the WZ 
problem. Our bounds improve on all existing finite blocklength bounds for these problems such as those in ||6). In 
addition, we use these bounds to recover known general formulas f7l- |[T0l and we also derive achievable second- 
order coding rates liTTI . lH2l for these side-information problems. 

^ Department of Information Science and Intelligent Systems, University of Tokushima, Email: shun-wata@is.tokushima-u.ac.jp 
^Department of Computer and Communication Sciences, Wakayama University, Email: kuzuoka@ieee.org 

'Institute for Infocomm Research (I 2 R), Agency for Science, Technology and Research (A*STAR), and Department of Electrical and 
Computer Engineering (ECE), National University of Singapore (NUS), Emails: tanyfv@i2r.a-star.edu.sg, vtan@nus.edu.sg 



2 



Traditionally, the achievability proofs of the direct pats of each of these coding problems are common and involve 
a covering step, a packing step and finally the use of the Markov lemma |2] (also known as conditional typicality 
lemma in the book by El Gamal and Kim [H). As such to prove tighter bounds, it is necessary to develop new proof 
techniques in place of the Markov lemma, covering and packing lemmas ID and their non-asymptotic versions J6], 
171 . These new techniques are based on the notion of channel resolvability Q, 1T31 , Ifl4l and channel simulation 
|[T5l - |[T9l . We use the former in the helper's code construction. Some historical remarks on the use of channel 
simulation to coding problems will be discussed in detail later in Section II-BI 

To illustrate our idea at a high level, let us use the WAK problem as a canonical example of all three problems of 
interest. Recall that in the classical WAK problem, there is an independent and identically distributed (i.i.d.) joint 
source (x n , y n ) = Y17=l PxY{xi,yi)- The main source X n ~ PS is to be reconstructed almost losslessly from 
rate-limited versions of both X n and Y n , where Y n is a correlated random variable regarded as side-information or 
helper. See Fig. [T] The compression rates of X n and Y n are denoted as R\ and R 2 respectively. The optimal rate 
region is the set of rate pairs (Ri,R 2 ) for which there exists a reliable code, that is one whose error probability 
can be made arbitrarily small with increasing blocklengths. WAK [2]|, [3l showed that the optimal rate region is 

R!>H(X\U), R 2 >I{U;Y) (1) 

for some Pu\Y- F° r the direct part, the helper encoder compresses the side -information and transmits a description 
represented by U n . By the covering lemma [1], this results in the rate constraint R 2 > I(U ; Y). The main encoder 
then uses binning [20] as in the achievability proof of the Slepian-Wolf theorem ETI to help the decoder recover 
X given the description U. This result in the rate constraint R\ > H(X\U). The main idea in our proof of a 
new non-asymptotic upper bound on the error probability for the WAK problem is that, mixed over some common 
randomness of arbitrarily large cardinality, the joint distribution of (U, Y) (in the one-shot notation) is close in the 
variational distance sense to (U, Y), where U designates the chosen auxiliary codeword (found classically via joint 
typicality encoding). As a result, by monotonicity and the data-processing lemma for the variational distance, it 
can be shown that the joint distribution of (U,Y,X) is also close to (U,Y,X). This means that in the asymptotic 
(n-fold i.i.d. repetition) setting, the triple (U,Y,X) is jointly typical with high probability. This technique thus 
circumvents the need to use the so-called piggyback coding lemma (PBL) and the Markov lemma 13 which result 
in much poorer estimates on the error probability. 



A. Main Contributions 

We now describe the three main contributions in this paper. 

Our first main contribution in this paper is to show improved bounds on the probabilities of error for WAK, WZ 
and GP coding. We briefly describe the form of the bound for WAK coding here. The primary part of the new 
upper bound on the error probability P e ($) for WAK coding depends on two positive constants 7b and j c and is 
essentially given by 

P e (*) <Pr(£ c U£ h ) (2) 

where the covering error is 

r Py W {U\Y) 1 

f H iog ^^4 (4) 

The notation < is not meant to be precise and, in fact, we are dropping several residual terms that do not contribute 
to the second-order coding rates in the n-fold i.i.d. setting if 7b and j c are chosen appropriately. This result is 
stated precisely in Theorem H"4l From d2), we deduce that in the n-fold i.i.d. setting, if we choose j c and 7b to be 
fixed numbers that are strictly larger than the mutual information I(U;Y) and the conditional entropy H(X\U) 
respectively, we are guaranteed that the error probability P e ( < l ) ) decays to zero. This follows from Khintchine's 
law of large numbers [7 Ch. 1]. Thus, we recover the direct part of WAK's result. In fact, we can take this one 
step further (Theorem l2T1i to obtain an achievable general formula (in the sense of Verdu-Han Q, 1221 ) for the 
WAK problem with general source ||7j Ch. 1]. This was previously done by Miyake-Kanaya |8] but their derivation 



and the binning error is 
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is based on a different non-asymptotic formula more akin to Wyner's PBL. Also, since we have the freedom to 
design 7 C and 7b as sequences instead of fixed positive numbers, if we let them be 0(-^)-larger than I(U; Y) and 
H(X\U), then the error probability is smaller than a prescribed constant depending on the implied constants in the 
0{ ■ )-notations. This follows from the multivariate Berry-Esseen theorem ll23l . This bound is useful because it is 
a union of two events and £ c and £^ are both information spectrum ||7) events which are easy to analyze. 

Secondly, the preceding discussion shows that the bound in (O also yields an achievable second-order coding 
rate ifTTTl . lfT2l . However, unlike in the point-to-point setting ifTTll . lfT2l . |[24l . the achievable second-order coding 
rate is expressed in terms of a so-called dispersion matrix |25l . We can easily show that if &wAK.{n, s) is the set 
of all rate pairs (R±,R2) for which there exists a length-n WAK code with error probability not exceeding e > 
(i.e., the (n, e)-optimal rate region), then for any Pjj\y and all n sufficiently large, the set 



I(U;Y) 
H(X\U) 



n 



log n 



n 



is an inner bound to ^wak(^>£)- In ©, ^(V, e) C K 2 denotes the analogue of the Q 
depends on the covariance matrix of the so-called information-entropy density vector 

W Pr\u(U\Y) , ' 
i0 & P Y (Y) 10 & 



(5) 



function [25] and it 



r(X\U) 



(6) 



The precise statement for the second-order coding rate for the WAK problem is given in Theorem [24] We see 
from © that for a fixed test channel Pmy, the redundancy at blocklength n in order to achieve an error probability 
e > is governed by the term ^^'^ . The pre-f actor of this term r y(V,e), is likened to the dispersion 11241 . 
||26l -[28l, and depends not only the variances of the information and entropy densities but also their correlations. 

Thirdly, we note that the same flavour of non-asymptotic bounds and second-order coding rates hold verbatim 
for the WZ and GP problems. In addition, since the canonical rate-distortion problem [29 1 is a special case of the 
WZ problem, we show that our non-asymptotic achievability bound for the WZ problem, when suitably specialized, 
yields the correct dispersion for lossy source coding lETI . ll28l . We do so using two methods: (i) the method of 
types [30] and (ii) results involving the D-tilted information ll28l . Finally, we not only improve on the existing 
bounds for the GP problem J6], ifTOl , but we also consider an almost sure cost constraint on the channel input. 



B. Related Work 

Wyner [2| and Ahlswede-Korner [|3] were the first to consider and solve the problem of almost-lossless source 
coding with coded side information. Weak converses were proved in (U, Q and a strong converse was proved 
in i3~T|| using the "blowing-up lemma". An information spectrum characterization was provided by Miyake and 
Kanaya [8] and Kuzuoka [32] leveraged on the non-asymptotic bound which can be extracted from [8| to derive 
the redundancy for the WAK problem. Verdii [6 | strengthened the non-asymptotic bound and showed that the error 
probability for the WAK problem is essentially bounded as 

Pe($)<Pr(£ c ) + Pr(£ b ), (7) 

which is the result upon using the union bound on our bound in d2]). Again, we used the notation < to mean 
that the residual terms do not affect the second-order coding rates. Bounds on the error exponent were derived by 
Kelly -Wagner f33l . 

Wyner and Ziv [4| derived the rate-distortion function for lossy source coding with decoder side-information. 
However, they do not consider the probability of excess distortion. Rather, the quantity of interest is the expected 
distortion and, more precisely, they considered the constraint that the asymptotic expected distortion is below a 
distortion threshold D > 0. The generalization of the WZ problem for general correlated sources was considered 
by Iwata and Muramatsu [f3 who showed that the general WZ function can be written as a difference of a limit 
superior in probability and a limit inferior in probability, reflecting the covering and packing components in the 
classical achievability proof. Bounds on the error exponent were provided by Kelly -Wagner ll33l . 

The problem of channel coding with noncausal random state information was solved by Gel'fand and Pinsker [|5|. 
Subsequent work by Costa showed that, remarkably, there is no rate loss in the Gaussian case [34]. This is done 
by choosing the auxiliary random variable to be a linear combination of the channel input and the state. A general 
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formula for the GP problem (with general channel and general state) was provided by Tan (TOl . An achievable 
error exponent was derived by Moulin and Wang ll35l . Tyagi and Narayan [36] proved the strong converse for 
this problem and used it to derive a sphere -packing bound. For both the WZ and GP problems, Verdu [6] used 
generalizations of the (asymptotic) packing and covering lemmas in El Gamal and Kim [ 1] to derive non-asymptotic 
bounds on the probability of excess distortion (for WZ) and the average error probability (for GP). However, they 
yield slightly worse second-order rates because the main part of the bound is a sum of two or three probabilities 
as in (0, rather than the probability of the union as in d2). 

In our work, we derive tight non-asymptotic bounds by using ideas from channel resolvability ||T3l (7] Ch. 6] and 
channel simulation lfT6l . ITTtTPI to replace the covering part and Markov lemma. For a given channel W and an input 
distribution, channel resolvability concerns the approximation of the output distribution with as small amount of 
randomness at the input as possible. It was shown by Han and Verdu [13] that this problem is closely connected to 
channel coding and channel identification. Hayashi also studded the channel resolvability problem lTT4l and derived 
a non-asymptotic formula that is different from Han and Verdu's. We will leverage on a key lemma in Hayashi (T4) 
to derive our finite blocklength bounds. 

In ifToll . (T7) . Bennett et al. proposed a problem to simulate a channel by the aid of common randomness. An 
application of the channel simulation to simulate the test channel in the rate-distortion problem was first investigated 
by Winter [18], and then extensively studied mainly in the field of the quantum information (eg. (TBI , (38), (39) )- 
Cuff investigated the trade-off between the rates of the message and common randomness for the channel simulation 
(T9l . In these literatures, the channel resolvability is implicitly or explicitly used as a building block of the channel 
simulation. Although the ideas to use the channel simulation instead of the Markov lemma is motivated by above 
mentioned literatures, we stress that the derivations of tight non-asymptotic bounds as in (f2l) are not straightforward 
applications of the channel simulation and are highly nontrivial, which are technical contributions of this paper. 
Indeed, our code construction of the channel simulation is slightly different from the literatures, and we also 
introduce some bounding techniques that have not appeared in any literatures. 

In (401 , Yassaee et al. proposed an alternative approach for channel simulation, in which they essentially used the 
(multi-terminal version of) intrinsic randomness (7] Ch. 2] instead of channel resolvability. Although their approach 
can be also used to replace the Markov lemma, it is not yet clear whether our bound can be also derived from the 
approach in (40). One of difficulties to apply the approach in [40] for non-asymptotic analysis is that the amount of 
common randomness that can be used in the channel simulation is limited by the randomness of sources involved 
in a coding problem, which is not the case with the approach using the channel resolvability. More precisely, the 
channel simulation errors in both approach involve terms stemed from the amounts of common randomness. In the 
channel resolvability approach, we can make the amount of common randomness arbitrarily large, and thus make 
that term arbitrarily small, which is not the case with the approach in [40]. See our Theorems [14] [17] and [19] 

Our main motivation in this work is to derive tight finite blocklength bounds on the error probability (or probability 
of excess distortion). We are also interested in second-order coding rates. The study of the asymptotic expansion 
of the logarithm of the maximum number codewords that are achievable for n uses a channel with maximum 
error probability no larger than e was first done by Strassen (41) . This was re -popularized in recent times by 
Kontoyiannis [42], Baron-Khojastepour-Baraniuk (43l . Hayashi (Til . (12) , and Polyanskiy-Poor-Verdu (241 among 
others. Other notable works in this area include those by Nomura-Han PRI for resolvability, Kostina-Verdii (28) 
for lossy source coding and Wang-Ingber-Kochman (26) for joint source channel coding. Second-order analysis for 
network information theory problems were considered in Tan and Kosut [25] as well as other authors |45l- (48) . 
However, this is the first work that considers second-order rates for problems with side-information that are not 
straightforward extensions of other known results. 

C. Paper Organization 

In Section [III we state our notation and formally define the three coding problems with side-information. We then 
review existing asymptotic, non-asymptotic and error exponent-type results in Section [HI] In Section [TV] we state 
our new non-asymptotic, channel-simulation-type bounds for the three problems. We then use these bounds to re- 
derive (the direct parts of the) known general formulas [|8], (TO) in Section [V] Following that, we present achievable 

'Steinberg and Verdu also studied the channel simulation problem 1371 . However, their problem formulation is slightly different from the 
one in OH, (TU. 
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Fig. 1. Illustration of the WAK problem 



second-order coding rates for these coding problems. We will see that just as in the Slepian-Wolf setting j25l . B71 . 
the dispersion is in fact a matrix. In Section I VIII we show via numerical examples that our non-asymptotic bounds 
lead to larger (n, e)-rate regions compared with [[6]. Concluding remarks and directions for future work are provided 
Section IVIIIj To ensure that the main ideas are seamlessly communicated in the main text, we relegate all proofs 
to the appendices. 

II. Preliminaries 

In this section, we introduce our notation and recall the WAK, WZ and GP problems. 



A. Notations 

Random variables (e.g., X) and their realizations (e.g., x) are in capital and lower case respectively. All random 
variables take values in some alphabets which are denoted in calligraphic font (e.g., X). The cardinality of X, if finite, 
is denoted as \X\. Let the random vector X n := (Xi, . . . , X n ) and similarly for a realization x n = (x\, . . . , x n ). 
The set of all distributions supported on alphabet X is denoted as .<^(X). We will at times use the method of 
types [30]. The joint distribution induced by a marginal distribution P £ £P(X) and a channel law V : X — > y is 
denoted interchangeably as P x V or PV. This should be clear from the context. 

For a sequence x n = (x\, . . . ,x n ) £ X n in which | X | is finite, its type or empirical distribution is the probability 
mass function P(x) = ^ J2i=i 1{ X = The set of types with denominator n supported on alphabet X is denoted 
as ZP n (X). The type class of P is denoted as Tp := {x n £ X n : x n has type P}. For a sequence x n £ Tp, the set 
of sequences y n £ y n such that (x n ,y n ) has joint type PV = P(x)V(y\x) is the V -shell Tv{x n ). Let ^(3^; P) 
be the family of stochastic matrices V : X — > y for which the F-shell of a sequence of type P £ & n (X) is not 
empty. Information-theoretic quantities are denoted in the usual way. For example, I(X;Y) and I(P, V) denote 
the mutual information where the latter expression makes clear that the joint distribution of (X, Y) is PV. All 
logarithms are with respect to base 2 so information quantities are measured in bits. 

The multivariate normal distribution with mean /x and covariance matrix S is denoted as A/"(/x, X!). The com- 
plementary Gaussian cumulative distribution function Q(t) := -j=e~ u I 2 du and its inverse is denoted as 
Q^ 1 (e) := min{t £ M : Q(t) < e}. Finally, \z\ + := max{z,0} and l{x £ .4} = 1 if x € A and otherwise. 



B. The Wyner-Ahlswede-Kdrner (WAK) Problem 

In this section, we recall the WAK problem of lossless source coding with coded side-information 0, Q- Let 
us consider a correlated source (X, Y) taking values in X x y and having joint distribution Pxy- Throughout, 
X, a discrete random variable, is the main source while Y is the helper or side-information. The WAK problem 
involves reconstructing X losslessly given rate-limited (or coded) versions of both X and Y. See Fig. [T] 

Definition 1. A (possibly stochastic) source coding with side-information code or Wyner-Ahlswede-Korner (WAK) 
code = (/, g, ip) is a triple of mappings that includes two encoders f : X — > M and g : y —> C and a decoder 
tp : M. x C — > X. The error probability of the WAK code <I> is defined as 

P e ($):=Pr{X^?P(f(X),g(Y))}. (8) 

In the following, we may call / as the main encoder and g the helper. 
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Fig. 2. Illustration of the WZ problem with probability of excess distortion criterion 



In Section IVTl we consider ra-fold i.i.d. extensions of X and Y, denoted as X n and Y n . In this case, we use 
the subscript n to specify the blocklength, i.e., the code is <E>„ = (f n , g n ,t/j n ) and the compression index sets are 
■M-n = fn(X n ) and C n = g n {y n )- In this case, we can define the pair of rates of the code as 

#1 (*„):=- log |A* n |, (9) 
n 

i? 2 ($n) := ilog|£ n |. (10) 
n 

Definition 2. The (n, e)-optimal rate region for the WAK problem .^wak(w,£) is defined as the set of all pairs 
of rates (R\, R2) for which there exists a blocklength-n WAK code <3? n with rates at most (Ri, R2) and with error 
probability not exceeding e. In other words, 

^WAKKe) := \{Ri,Rfi e»+:3$ n s.t. -log|M n | < #i,-log|£„| < R 2 ,P c (<S> n ) < e) (11) 

n n 



We also define the asymptotic rate regions 



^WAkO) := cl 



(J ^wAK(n,e) 



n>l 



(12) 



^WAK := P| ^WAK(e)- (13) 

0<e<l 

where cl denotes set closure in M 2 . 

In the following, we will provide an inner bound to J'wakKe) that improves on inner bounds that can be 
derived from previously obtained non-asymptotic bounds on P e (<f> n ) 0, |[32l . 



C. The Wyner-Ziv (WZ) Problem 

In this section, we recall the WZ problem of lossy source coding with full side-information at the decoder J4]. 
Here, as in the WAK problem, we have a correlated source (X, Y) taking values in X x y and having joint 
distribution Pxy- Again, X is the main source and Y is the helper or side-information. Neither X nor Y has to be 
a discrete random variable. Unlike the WAK problem, it is not required to reconstruct X exactly, rather a distortion 
D between X and its reproduction X is allowed. Let X be the reproduction alphabet and let d : X x X — > [0, 00) 
be a bounded distortion measure such that for every x G X there exists a x G X such that d(x,x) = and 
max x ^d(x, x) = D max < 00. See Fig. |2] 

Definition 3. A (possibly stochastic) lossy source coding with side-information or Wyner-Ziv (WZ) code $ = (/, if;) 
is a pair of mappings that includes an encoder f : X — )■ A4 and a decoder ip : Ai x 3^ — > X. The probability of 
excess distortion for the WZ code <I> at distortion level D is defined as 

P e ($; D) := Pr{d(X, ip(f(X),Y)) > D}. (14) 

We will again consider n-fold extensions of X and Y, denoted as X n and Y n in Section |VT] The code is indexed 
by the blocklength as <& n = (f n ,ip n ). Furthermore, the compression index set is denoted as M. n = f n (X n ). The 
rate of the code <^ n is defined as 

R($ n ) := -\og\M n \. (15) 
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The distortion between two length-n sequences x n £ X n and x n 6 X n is denned as 

1 - 

d n (/,?):=-Vd(x 1 ,x i ). (16) 
n z — ' 

8=1 

Definition 4. 77ie (n, e)-Wyner-Ziv rate-distortion region ^wz(",e) C w f/ie sef o/ a// rate-distortion pairs 
(R, D) for which there exists a blocklength-n WZ code <£> n at distortion level D with rate at most R and probability 
of excess distortion not exceeding e. In other words, 

&wz(n,e) := UR,D) €R 2 + :3^> n s.t. ^log|.M n | < R,P c ($ n ; D) < el (17) 



We also define the asymptotic rate-distortion regions 



^wz(e) := cl 



(J ^wz(«,e) 



n>l 



(18) 



^wz:= f| #wz(e). (19) 

0<e<l 

The (n, e)-Wyner-Ziv rate-distortion function R\N Z {n,£,D) is defined as 

R wz (n,e,D) := inf{i? : (R, D) £ ^ wz (n,e)} (20) 

We also define the asymptotic rate-distortion functions 

R wz (e, D) = m£{R : (R, D) G ^wz(e)} (21) 
R WZ (D) = lira R wz (e,D) (22) 



Note that the use of the limit (as opposed to the limit superior or limit inferior) in (122) is justified because 
-Rwz(e> D) is, from its definition, monotonically non-increasing in e. In the sequel, we will provide an inner bound 
to e) and thus an upper bound on Ry] Z (n, e, D) by appealing to a new non-asymptotic upper bound on the 

probability of excess distortion P e ($ n ;7J). In addition, note that if Y = 0, i.e., side-information is not available, 
this reduces to the point-to-point rate-distortion (lossy source coding) problem. 

Conventionally (T), H, the WZ problem is stated not with the probability of excess distortion criterion but with 
the average fidelity criterion. That is, the requirement that P c (Q n ;D) — > (implicit in (1221 ) is replaced by 

limsu P E[d n (X n ,Vn(/n(X n ),y n ))] < D. (23) 



D. The Gelfand-Pinsker (GP) Problem 

In the previous two subsections, we dealt exclusively with source coding problems, either lossless (WAK) or lossy 
(WZ). In this section, we review the setup of the GP problem which involves channel coding with noncausal 
state information at the encoder. It is the dual to the WZ problem ||49l . In this problem, there is a state-dependent 
channel W : X x S — > y and a random variable representing the state S with distribution P$ taking values in some 
set S. A message M chosen uniformly at random from Ai is to be sent and the encoder has information about 
which message is to be sent as well as the channel state information S, which is known noncausally. (Noncausality 
only applies when the blocklength is larger than 1.) It is assumed that the message and the state are independent. 
Let g : X — > [0, oo) be some cost function. The encoder / encodes the message and state into a codeword (channel 
input) X = f(M, S) that satisfies the cost constraint 

g(X) < T, (24) 

for some F > with high probability. See precise definition/requirement in (|25l ) as well as Proposition Q] The 
decoder receives the channel output = x, S = s} ~ W( ■ \x, s) and decides which message was sent via a 

decoder ip : y — > M. See Fig. [3] More formally, we have the following definition. 
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Fig. 3. Illustration of the GP problem 
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Definition 5. A (possibly stochastic) code for the channel coding problem with noncausal state information or 
Gelfand-Pinsker (GP) code <1> = (/, VO is a pair of mappings that includes an encoder f : A4 x S — > X and a 
decoder ifj : y — > M. The average probability of error for the GP code is defined as 

1 



Pe(^;r) 



\M\ 



E 



E Ps(s) E ^(f l/K s )> s ) x i))>ruyey\f 1 H}. (25) 



More simply, P e ($;r) = Pr({g(/(M, S)) > T} U {M / M}) w/jere M ij uniform on M and independent of 
S ~ P5, M := ^(y) a«<i F « the random variable whose conditional distribution given M = m and S = s is 
W(-\f(m,s),s). 

The following proposition, which will be proved in Appendix |Aj guarantees that we can always convert a code 
in the sense of Definition [5] into a code in the sense of an almost sure cost constraint. 



Proposition 1 (Expurgated Code). Let the set of admissible inputs in X be 

r K GP (r) : = {xex:g(x)<r}. 



(26) 



For any (stochastic) encoder Px\ms (this plays the role of f in Definition [5]) and decoder Pjfri Y (this plays the 
role ofijj in Definition^, there exists an encoder Px\ms such that 



Px{lf P {T))=l 



and 



where P 



P MSXYM [m^m}< P MSXYM > T U m + ™\ ' 



(27) 



(28) 



MSXYM •" 



PmPsPx\ms^Pm\ y and p i 



MSXYM 



PmPsPx\msWPm\ Y . 



From Proposition [T] noting that P e ((-P. 



X\MSiPm\Y)^> 



P 



MSXYM 



[g(x) >rum/m], we see that the 
constraint in d24b is equivalent to g(X) < T almost surely (implied by d27l)). For the purposes of deriving channel 
simulation-based bounds in Section |TV-C[ it is easier to work with the error criterion in (1231 so we adopt Definition [5] 
In order to obtain achievable second-order coding rates for the GP problem, we consider re-fold i.i.d. extensions 
of the channel and state. Hence, for every (s", x n ,y n ), we have W n (y n \x n , s n ) = Y\7=i W(yi\ x u s i) an( ^ tne state 
S n evolves in a stationary, memoryless fashion according to P5. For blocklength n, the code and message set are 
denoted as = (f n ,ipn) an d M. n respectively. The cost function is denoted as g n : X n — > [0, 00) and is defined 
as the average of the per-letter costs, i.e., 



1 - 

g n (x n ) := - Vg(Xi) 
n 



(29) 



For example, in the Gaussian GP problem (which is also known as dirty paper coding |34l ), g(x) = x 2 . This 
corresponds to a power constraint and T is the upper bound on the permissible power. The rate of the code is the 
normalized logarithm of the number of messages, i.e., 

R($ n ):=-log\M n \. (30) 
n 
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Definition 6. The (ra,e)-GP capacity-cost region ^gpK^) C f/ie sef o/a/Z rate-cost pairs (R,T) for which 

there exists a blocklength-n GP code <3? n with cost not exceeding F, with rate at least R and probability of error 
not exceeding e. In other words, 



V^i>(/>. :) { (R,T) eR 2 + : 3 3> n s.t. 1 log \M n \ > R, P e ($n, T) < e \ . 



We also define the asymptotic capacity-cost regions 



^Gp(e) :=cl 



|J ^ G p(n,e) 



n>l 



^gp ; = Pi ^gp(^)- 



0<e<l 

The (n, e) -capacity-cost function C G p(n, e,F) is defined as 

C GP (n,e,T) := sup{i? : (R,F) G V GP (n,e)} 

We also define the asymptotic capacity-cost functions 

C GP (e,T) := sup{R : (R, F) G tf GP (e)} 
C GP (F) := limC GP (e,r) 

£-S>0 



(31) 

(32) 
(33) 

(34) 

(35) 
(36) 



If the cost constraint (1241 ) is absent (i.e., every codeword in X n is admissible), we will write C G p(n,e) instead of 
C G p(n,e,oo), P e ($ n ) instead of P c ($ n ; oo) and so on. 

Once again, the limit in (l36l ) exists because the function C G p(e, F) is monotonically non-decreasing in e. In the 
sequel, we will provide a lower bound on C G p(n, e, F) by appealing to a new non-asymptotic upper bound on the 
average probability of error P e (3> n ;r). 

III. Review of Existing Results 
In this section, we review existing asymptotic and non-asymptotic results for the three problems in Section |TT] 



A. Existing Results for the WAK Problem 

We first state the asymptotic optimal rate region for the WAK problem. Subsequently, we state some existing 
non-asymptotic bounds on the probability of error, defined in ©. 

Let &{Pxy) be the set of all joint distributions Puxy G ^{U x X x y) such that the X x ^-marginal of 
Puxy is the source distribution Pxy, U -Y - X forms a Markov chain in that order an cH \U\ < \y\ + 1. Define 

^wak-= U {(Ri,R2)£M 2 + :R 1 >H(X\U),R 2 >I(U;Y)}. (37) 

Puxy&^(Pxy) 

Wyner [2] and Ahlswede-Korner [3] proved the following: 

Theorem 2 (Wyner JH, Ahlswede-Korner El). For every < e < 1, we have 

^WAK (e) = ^WAK = ^WAK , (38) 

where ^wak(e) and &wak are defined in (fT2l and ([T3l respectively. 

To prove the direct part, Wyner used the PBL and the Markov lemma while Ahlswede-Korner used a 
maximal code construction. Only weak converses were provided in [2] and [3]. Ahlswede-Gacs-Korner 11311 proved 
the strong converse using entropy and image-size characterizations 11301 Ch. 15], which are based on the so-called 
blowing-up lemma (301 Ch. 5]. See fl30j Thm. 16.4]. 

2 The cardinality bound on U in the definition of &(Pxy) is only applied when we consider the single letter characterization .3?wak> an d 
it is not applied when we consider the non-asymptotic analysis. Similar remarks are also applied for the WZ and GP problems. 
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Theorem [3] presents a non-asymptotic version of Wyner's bound and was proved recently by Kuzuoka IT321 using 
the PBL technique and the Markov lemma 0. For fixed auxiliary alphabet U, joint distribution Pjjxy £ &(Pxy) 
and arbitrary non-negative constants 7b and j c , we define two sets 

7; WAK (7b) := Uu,x) 6 U X X : log ) < 7 b) , (39) 

7; WAK (7c) :={m^x^: log < 7c } . (40) 

These sets are similar to the typical sets used extensively in network information theory [1] but note that these sets 
only involve the entropy and information densities. Consequently, the probabilities of these sets (events) are entropy 
and information spectrum quantities [7 1. The subscripts b and c refer respectively to binning and covering. Similar 
subscripts and will be used in the sequel for the other side-information problems to demonstrate the similarities 
between the proof techniques all of which leverage on ideas from channel resolvability [7] Ch. 6] [14] and channel 
simulation |[T5l-|fT9l. 

Theorem 3 (Kuzuoka | 32]). For arbitrary 7b, 7c > 0, there exists a WAK code <3? with error probability satisfying 

P.(*) < 2^Pax(7; WAK (7b) c ) +P f /r(7; WAK (7c) c ) + ^| + ex p{"^} ■ ( 41 ) 

The first and second terms are the dominant ones for an appropriate choice of 7b and 7c . They are entropy 
and information spectrum quantities that can be easily evaluated in the n-fold i.i.d. setting using, for example, 
the central limit theorem. Observe that the second term represents the encoding of Y with U and the first term 
represents the decoding of X given U. The first term can be large due to the square root resulting from Wyner's 
PBL. Verdu [6 ] demonstrated a refined version of Theorem |3]in which the square root in the first term is removed. 

Theorem 4 (Verdu 0). For arbitrary 7b, 7 c > 0, there exists a WAK code $ with error probability satisfying 

P.(*) < ^x(7^ WAK ( 7 b) c ) + ^y(7; WAK (7c) c ) + + exp j-M j . (42) 

It is worth briefly describing the side-information encoder and the decoder used in Verdu's work |]6]. Verdu 
defines the following metric 



Y = y (43) 



7r(u, y) := Pr j log l — — > 7b 

I Px\u{X\u) 

and the helper encoder searches for a codeword m,l G C that minimizes ir(ui,y) given the side-information y. 
The decoder examines the appropriate decompression bin for a codeword with entropy density — \ogP x \u{X\u) 
not exceeding 7 b- Verdu then uses generalizations of the covering and packing lemmas in HI to prove (l42l . In 
Section IIV-AI we further improve on Verdu's bound. We show that the two information spectrum terms in (l42l 
(first two terms) can be combined under a single probability. Thus, the derived achievable second-order coding rate 
is better than that using Verdu's bound in (l42l . 

In another line of work, Kelly and Wagner [33 ] demonstrated bounds on the error exponent for the WAK problem 
with stationary and memory less source Px r >Y n = Pxy- Here we present only the direct part (lower bound on the 
error exponent). 

Theorem 5 (Kelly-Wagner f33l ). There exists a sequence of WAK codes {$ n }™ =1 with rates satisfying 

limsup — log \M n \ < Rii limsup — log \C n \ < R2, (44) 

n— >oo Tl n— ¥00 Tl 



such that the error probabilities satisfy 

>oo n P P f$ 



liminf - log — ^— > i%(P XY ,Ri,R2) (45) 
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where 

t]l(Pxy,Ri,R2) ■= min max min D(Qxyu\\Pxy x Qu\y) 

Qy Qu\y Qx\yu- 

ff(Qx)>fli 

f \R 1 + R 2 -H(Q xlu \Q u )-I(Q Y ,Q u{Y )\+ I(Qy,Qu\y) > R2 4fi 
+ \ \Ri-H(Q x]u \Qu)\+ I(Qy,Qu\y)<R2 { ' 

The proof in [33] is based on the method of types [30 1. The helper encoder quantizes its observation Y n using 
the test channel Qjj\y- It sends the quantization index and the type of Y n . Then, the helper encoder optionally 
uses binning for the quantized sequence. The primary encoder uses binning for each source type class if necessary. 
This corresponds to the two cases in (|46T i. The decoder finds the sequence in the specified bin with the specified 
type with the smallest empirical conditional entropy (conditioned on the side quantized sequence). Thus, the code 
is a universally attainable one. 



B. Existing Results for the WZ Problem 

As in the previous section, we state the asymptotic WZ rate-distortion function. Subsequently, we state some 
existing non-asymptotic bounds on the probability of excess distortion defined in (IT4l >. 

Let ^d(Pxy) be the set of all pairs (PuxY,d) where P\jxy £ &{U x X x y) is a joint distribution and 
g :U xy — > X is a (reproduction) function such that the X x ^-marginal of Pjjxy is the source distribution Pxy, 
U — X — Y forms a Markov chain in that order, \U\ < \X \ + 1 and the distortion constraint is satisfied, i.e., 

E[d(X, g(U,Y))} = V P UXY (u,x,y)d(x,g(u,y)) < D. (47) 



u,x,y 



Define the function 



Rhz(.D):= min I(U;X) — I(U;Y). (48) 

{PvxY,g)&3>D{Px Y ) 



Note from Markovity that I(U ; X)—I(U ; Y) = I(U ; -X"|3^). Then, we have the following asymptotic characterization 
of the WZ rate-distortion function. 

Theorem 6 (Wyner-Ziv [0]). We have 

R W z(D) = R^ Z (D), (49) 



where i?\yz(L>) is defined in (1221 ). 

The direct part of the proof of the theorem in the original Wyner-Ziv paper [4 | is based on the average fidelity 
criterion in (|23l . It relies on the compress-bin idea. That is, binning is used to reduce the rate of the description of 
the main source to the receiver. The encoder transmits the bin index and the decoder searches within that bin for 
the transmitted codeword. The reproduction function g is then used to reproduce the source to within a distortion 
D. To prove Theorem [6] for the probability of excess distortion criterion, we can use the bounds provided in the 
following (e.g., Theorems [7] or [8), which, at a high-level, are based on the same ideas as in [4]. The converse in (31 
was proved for the average fidelity criterion in (|23l but can be adapted for the probability of excess distortion 
criterion by noting that for a sequence of rate-i? code^l {Q n }%Li for which P e (<3? n ; D) — > 0, 

limsupE[d n (X n ,Vn(/n(X n ),y"))] 

n— >oo 

= limsup / DmiX P(d n (X n ,Mfn(X n ),Y n )) > t)dt (50) 
n— too Jo 

< [ D '^lim S upP(d n (X n ,Mfn(X n ),Y n ))>t)dt (51) 

JO n— >oo 

= [° limsup P(d n (X n ,Mfn(X n ),Y n )) >t)dt<D (52) 

JO n— >oo 



This means that the limsup of #($„) defined in (BJ is no larger than R. Also see J58t , 
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where (f5TT > is by Fatou's lemma (the space [0, D max ] has finite Lebesgue measure) and the equality in (1521 is by the 
assumption that P e (<3? n ;i) — > for every t 6 (-D,_D max ]. As such by standard manipulations (see CD Thm. 11.3]), 
i?wz(-C) > i?^ z (D) for the probability of excess distortion criterion. To the best of the authors' knowledge, it 
is not known whether Rwz(z,D) = ^wz(^) f° r all < e < 1, i.e., whether the strong converse holds for the 
probability of excess distortion criterion. 

The analogue of Theorem [3] can be distilled from the work of Iwata and Muramatsu O who derived the general 
formula for the (fixed-length, maximum-distortion criterion) Wyner-Ziv problem. We rederive their general formula 
in Section IV-BI Before we state the theorem, let us define the three sets for fixed {PuxY,g) £ ^(Pjy) and 
non-negative constants 7 P and 7c : 

77 Z (7p) := {(«,y) € U x y : log > 7p } (53) 

7; WZ (7c) :={{u,x)eUxX: log < 7c } (54) 

T^ Z (D) := {(u,x,y) £U x X x y : d(x,g(u,y)) < D} . (55) 

These sets have intuitive explanations: 7^ WZ (7 C ) C represents the covering error that U is unable to describe X to 
the desired level indicated by 7 C ; 7p WZ ( 7p ) c represents the packing error in which the decoder is unable to decode 
the correct codeword U given Y using a threshold test based on the information density statistic and 7 p ; T^ Z (D) C 
represents the distortion error in which the the reproduction X not within a distortion of D of the source X. 

Theorem 7 (Refinement of Iwata-Muramatsu Q). For arbitrary 7 P , 7c > and an arbitrary positive integer L, 
there exists a WZ code <3? with probability of excess distortion satisfying 

P e (<3> ; D) < 2^P UY (T^( lv Y) + Pux(T™ z ( 7c ) c ) + P UXY {T^ z {Df) + + exp |-^| . (56) 

The quantity Lj\M\ in d56l ) can be interpreted as the size of each bin. Recall that in WZ coding [4], binning 
is used to reduce the rate used to describe the source to the decoder, because the decoder has access to correlated 
side-information Y. Verdu improved on this bound by using a novel decoder and non-asymptotic versions of the 
packing and covering lemmas in [fl]. This removes the factor 2 and the square-root operation in the first term 
in d56l ). These operations, which loosen the bound in (TSBI l, result from the use of Wyner's PBL and Markov lemma. 

Theorem 8 (Verdu [6]). For arbitrary 7p , 7c > and an arbitrary positive integer L, there exists a WZ code $ 
with probability of excess distortion satisfying 

Pc(<&; D) < P [ /r(7 p wz (7 P ) c ) + fe(T c wz ( 7 c) c ) + P UXY (T^ z (Dy) + + exp j- i- j . (57) 

The salient terms in (l57l are the first three. The first two terms are information spectrum terms which can be 
evaluated easily in i.i.d. setting. The third term, though not an information spectrum term can also be bounded easily. 
In Section ITV-B1 we improve on Verdu 's bound in (l5Tb by placing all three salient terms under a single probability. 
Thus, the second-order coding rate derived from our bound is no worse than that derived from (157V In fact, one 
of our contributions is to show that the dispersion (or second-order coding rate) of lossy source coding derived 
by Ingber-Kochman |[27l and Kostina- Verdu |28] can be derived from our non-asymptotic channel-simulation-type 
bound. See Theorem [29] in Section IVI-BI 

Kelly-Wagner also derived bounds on the error exponent for the probability of excess distortion in the WZ 
problem with stationary and memoryless source Px^y™ = Pxy ^ ne achievability result can be stated as follows: 

Theorem 9 (Kelly-Wagner f33l ). For a fixed sequence of WZ codes {^ n }^ =l , with rates 

limsup-log|Al n | < R (58) 
let the error exponent for the probabilities of excess distortion of { < I ) n }^L 1 be defined as 

6(R,D,P XY ) :=liminf- log — ^— (59) 

n^oo n P e ($n) 
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There exists a sequence of WZ codes {& n }^ = i with rates satisfying (1581 ) for which the error exponent in (1591 ) is 
lower bounded as 

0(R, D, Pxy) > min max min max min Jd(Quxy, Pxy , f, R, D) (60) 

Qx Qu\x Qy g&5 Quxy 

where the set Q := {g :IA x y — > X}, the auxiliary alphabet U takes on finitely many values and 

D(Q UX y\\Pxy x Qu\ x ) E QuXY [d(X,g(U,Y))} > D 
D(Q UX y\\Pxy x Qu\ x ) 

+\R-I(Q x ,Qu\x) E QuxY [d(X,g(U,Y))]<D, (61) 

+I(Qy,Qu\y)\ + I(Q x ,Qu\x)>R 
oo otherwise 



Jd(Quxy,Pxy,9,R,D) := < 



Note in the final minimization over Quxy> Qux an d Qy are fixed to be those specified earlier in the optimization. 

The proof of Theorem [9]is similar to Theorem [5] and makes heavy use of the method of types. Roughly speaking, 
the two cases in (f6Tb correspond to the whether binning is necessary based on the realized source type Qx G & n (X). 
Also, for every source type Qx, we can optimize over the test channel (shell) Qu\x £ Qx)- For every side- 

information type Qy G & n (y), we can optimize over the reproduction function g. This explains the first four 
optimizations in (l60l ). The final minimization represents an adversarial consistent joint distribution for the triple 
(U,X,Y) ~ Quxy- This result, just like the WAK one in Theorem [5] has a game-theoretic interpretation and the 
order of plays matches the problem. 

C. Existing Results for the GP Problem 

We conclude this section by stating existing results for the GP problem Q. Recall that in the GP problem, we 
have a channel W : X x S — > y and a state distribution P$ G &{S). Assume for simplicity that all alphabets are 
finite sets. Let £^r(W, Ps) be the collection of all joint distributions Puxsy £ ^(U x X x S x y) such that the 
5-marginal is Ps, the conditional distribution Py\xs = W, U ~ 0^> S) — Y forms a Markov chain in that order, 

E[g(X)] < T (62) 

and \U\ < min{|Ar||«S|, |<S| + |^| - 1}- Define the quantity 

C* GP (T) := max I(U;Y) - I{U; S), (63) 

PuxSYe^r(W,P s ) 

where I(U ; Y) and I(U ; S) are computed with respect to the joint distribution Puxsy- If there is no cost 
constraint (l62l ). we simply write Cq P instead of Cq P (oo). Then, we have the following asymptotic characterization. 

Theorem 10 (Gel'fand-Pinsker Q). If the alphabets S,X and y are discrete, for every < e < I, we have 

C GP (e) = C GP = C* GP (64) 

where C GP (e) and C GP are defined in (T35T ) and (136*1 ) respectively. 

The direct part was proved using a covering-packing argument as well as the conditional typicality lemma (using 
the notion of strong typicality). Essentially, each message m G A4 is uniquely associated to a subcodebook of size 
L. To send message m, the encoder looks in the m-th subcodebook for a codeword that is jointly typical with the 
noncausal state. The decoder then searches for the unique subcodebook which contains at least one codeword that 
is jointly typical with the channel output. The weak converse in the original Gel'fand-Pinsker paper was proved 
using the Csiszar-sum-identity. See [1, Thm. 7.3]. Tyagi and Narayan proved a strong converse [36] using entropy 
and image-size characterizations via judicious choices of auxiliary channels. Their proof only applies to discrete 
memoryless channels with discrete state distribution without cost constraints. 

A recent non-asymptotic bound for the average error probability was proved by Tan ifTOl without the cost 
constraint (f24l . To state the bound, we define the sets 

7^ P (7 P ) := {(u,y) G U x y : log > 7 P } (65) 

7f P (7c) := {(«,*) €U x S : log < 7c} (66) 
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These are analogous to the typical sets used extensively in network information theory [Q] but they only involve the 
information densities. The first set in (l65l represents packing event while the second in (l66l ) represents covering 
event. Also notice the similarities to 7p WZ (7 p ) and 7^ WZ (7 C ) in ( f53T > and ( f54T > for the WZ problem. However, now 
S plays the role of X while Y has the same interpretation in both the WZ and GP problems. With these definitions, 
Tan |[T0l proved the following non-asymptotic characterization for the average error probability in the GP problem. 

Theorem 11 (Tan H1010 . For arbitrary 7 P ,7 C > and an arbitrary positive integer L, there exists a GP code <3? 
with average probability of error satisfying 

P c (d>) < 2^P UY (T^( lv Y) + P ra (T c GP ( 7 c) c ) + ^ + exp j- i- j . (67) 

The parameter L again represents the number of codewords in each subcodebook. The first two terms are 
information spectrum terms which can be evaluated fairly easily in the n-shot setting. The proof of (I67T ) again uses 
Wyner's PBL and the Markov lemma 0. Verdii ||6] presented a tightened version of the above bound for the case 
where there are no cost constraints. 

Theorem 12 (Verdii |6]]). For arbitrary 7 P ,7 C > and an arbitrary positive integer L, there exists a GP code <3? 
with average probability of error satisfying 

P.(*) < ^y(T p GP (7 P ) c ) + Pus(T? P (7c) c ) + ^ + exp j- A | . ( 6 8) 
To prove (I68T ). Verdii considered the function 



C(fl,u) := Prjlog ^^ < 7p | (C/,5 1 ) = (u,s) j (69) 

To send message m, the encoder given the noncausal state sequence s searches within the m-th subcodebook for the 
codeword u rn i that minimizes C(s,u m i). The decoder is Feinstein-like [50 1 and basically declares finds a codeword 
whose information density log Py ^^)^ exceeds a prescribed threshold 7 P . Its subcodebook index is declared as 
the sent message. In Section |IV-C[ by using a technique based on channel-simulation, we show that the information 
spectrum terms in (|68l l can, in fact, be placed under a single probability. Thus, the derived achievable second-order 
coding rate is better than that using Verdii's bound in (|68V 

When the channel and state are discrete, memoryless and stationary, i.e., W n (y n \x n , s n ) = Yli=i W(yi\xi, Si) 
and Psi(s n ) = YYi=i Ps{si), error exponents can be derived. Indeed, a random coding error exponent for the GP 
problem was presented by Moulin and Wang 11351 for the case without cost constraints. We state a simplified version 
of the main result in [35 ] here. 



Theorem 13 (Moulin-Wang 1351 ). There exists a sequence of GP codes {^nj^Li of rates 



liminf - log \M n \ > R (70) 

n— >oo n 

for which the average error probabilities satisfy 

liminf - log — > VL (W, P S , R) (71) 

n^co n P e ($n) 

where, letting the auxiliary alphabet U take on finitely many values, we have 

Vh(W, Ps,R) ■= min max min \D(Q S x Q X u\s x Qy\xus\\Ps x Qxu\s x W) 

Qs Qxuis Qy\xus 

+ \I Q (U;Y)-I Q (U;S)-R\ + }. (72) 

Note that Iq(U;Y) and Iq(U;S) are the mutual information quantities computed with respected to Qsxuy '■ = 
Qs x Qxu\s x Qy\xus- 

Like Theorem [9] this result has a game-theoretic interpretation. Nature chooses an adversarial state sequence 
represented by the type Qs £ & n (S) and the player can optimize for the best test channel Qxu\s £ Qs) 
to find a good channel input. Nature then chooses an adversarial channel Qyixus £ ^(3^ Qxu\sQs)- The order 
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of plays matches the order of the optimizations in d72l . It is worth mentioning that a sphere-packing bound (upper 
bound on the error exponent) was derived by Tyagi and Narayan ll36l by appealing to their strong converse and 
the Haroutunian change of measure technique ||5T| . 



IV. Main Results: Novel Non-Asymptotic Achievability Bounds 

In this section, we describe our results concerning novel non-asymptotic achievability bounds for the WAK, WZ 
and GP problems. We show using ideas from channel resolvability \1 „ Ch. 6] lfl"3l lfl4l and channel simulation [ 15 ]- 
Ifl9l that the bounds obtained in Theorems |4] [8] and [12] can be refined so as to obtain better second-order coding 
rates. The definition of and techniques involving channel resolvability and channel simulation are reviewed in 
Appendices |B] and [C] respectively. These are concepts that form crucial components of the proofs of the Channel- 
Simulation-type (CS-type) bounds in the sequel. 

The following quantity, introduced in lfl4l . will be used extensively in this section so we provide its definition 
here. For a joint distribution Pjj Y E &(JA x y) and a positive constant 7 C , define 

a / p \ ^ Pu{u)P Y \u{y\u)\ r P YW (y\u) \ 



Py(v) I Pv{y) 

(u,y)£Uxy 



E 



Pm 



PY\u(X\U) i f P Y \u(Y\U) <27c 



(74) 



P Y (Y) \ P Y (Y) 
Observe from (l74l that A(7 C , Puy) has the property that 

A( lc ,P UY ) <2^. (75) 

A. Novel Non-Asymptotic Achievability Bound for the WAK Problem 

Fix an auxiliary alphabet U and a joint distribution Pjjxy E ^(Pxy)- See definition of S?(Pxy) prior to (l37T i. 
Also recall the definitions of the sets 7^ WAK (7b) and 7^ WAK (7 C ) for the WAK problem in (|39]> and (gOj) respectively. 
The following is our CS-type bound. 

Theorem 14 (CS-type bound for WAK coding). For arbitrary 7b, 7 C > 0, and S > 0, there exists a WAK code $ 
with error probability satisfying 

Pc($) < Puxy [(u,x) E 7L WAK (7b) c U (u,y) G 7; WAK ( 7c ) c ] 



1 1 (n,x)eT b WAK ( 7 b) V 1 1 

See Appendix O for the proof of Theorem [14] Observe that the primary novelty of the bound in (|76T > lies in the 
fact that both error events {(it, a:) E 7b WAK (7b) c } and {( u )?/) E 7^ WAK (7 C ) C } lie under the same probability and 
so can be bounded together (as a vector) in second-order coding analysis. See Section IVI-AI Notice that the sum 
of the information spectrum terms (first two terms) in Verdu's bound in (l42l) is the result upon invoking the union 
bound on the first term in (TTdT i. We illustrate the differences in the resulting second-order coding rates numerically 
in Section |VlIl The bound in (1761 ) is rather unwieldy. We can simplify it without losing too much. Indeed, using 
the definition of 7^ WAK (7 b ), we observe that the third term in (1761 can be bounded as 

\m\ g Pu ^ = \m\ g Pu{u) p^m (77) 

< r^r E Pu(u)P x{u (x\u)2^ (78) 



27b 

Together with d75l l. we have the following simplified CS-type bound, which resembles a Feinstein-type BUI 
achievability bound. 
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Corollary 15 (Simplified CS-type bound for WAK coding). For arbitrary 7b, 7c > 0, and 5 > 0, there exists a 
WAK code $ with error probability satisfying 




Pe(*) < Puxy [(u,x) E 7L WAK (7b) c U («,y) G 7; WAK ( 7c ) c ] + TXfT + \/T7T + * < 80) 



If (X n , Y n ) is drawn from the product distribution Pyy, then by designing 7b and 7 C appropriately, we see that 
the dominating term in (|80l l is the first one. The other terms vanish with n. In particular, 5 stems from the amount 
of common randomness known to all parties and since the amount of common randomness can be arbitrarily large, 
5 can be made arbitrarily small. In addition, A in (|73T ) results from approximating an arbitrary distribution with 
one that is simulated by a channel lfl4l Lem. 2]. See Appendix iBl 

By modifying the helper in the proof of Theorem [14] we can show the following theorem. 

Theorem 16 (Modified CS-type bound for WAK coding). For arbitrary 7b, 7 C > 0, and 5 > and positive integer 
J, there exists a WAK code with error probability satisfying 



Pe($) < Puxy [(u,x) G 7^ WAK ( 7 b) c U (u, y) G 7; WAK ( 7c 



1 1 («,x)er b WAK ( 7b ) 1 (n,£)eT b WAK ( 7 b) 

See Appendix |E] for the proof of Theorem [16] By letting J = \C\ in (|8~T1 ). we recover dTBT l up to an additional 
residual term, which is unimportant in second-order analysis. A close inspection of the proof reveals that the 
additional term is due to additional random bin coding at the helper, which is not needed if J = |£|. 

Remark 1. For the special case such that test channel Pmy is noiseless, we can show that there exists a WAK 
code satisfying 

P.(*) < Pxy [(*, y) G 7; WAK ( 7 b) c U 7; WAK (7 S ) C ] + + r^cj (82) 



for any 7b, 7s > 0, where 



t;wak (7s) := i M eXxy: log _i__ < 7s }, . (83) 



We can prove the bound (1821 ) by using the standard Slepian-Wolf type bin coding for both the main encoder and the 
helper H25\l . MZI/- As it will turn out later in Section \VI-A\ this simple bound gives tighter second-order achievability 
in some cases. 



B. Novel Non-Asymptotic Achievability Bound for the WZ Problem 

We now turn our attention to the WZ problem where we derive a similar bound as in Theorem [14] This improves 
on Verdu's bound in Theorem [8] It again uses the same CS idea for the covering part. Recall the definitions of the 
sets 7^ WZ (7 P ), 7c WZ (7c) and T^ Z {D) in (53), (5J) and (53J respectively. In the following, P UX y E ^(U xXxy) 
satisfying (i) the X x 3^-marginal of Pjjxy is Pxy and (ii) U — X — Y forms a Markov chain is fixed. 

Theorem 17 (CS-type bound for WZ coding). For arbitrary constants 7 P ,7 C > 0, and 5 > and positive integer 
L, there exists a WZ code $ with probability of excess distortion satisfying 

P.(*;D) < P uxr [(«,») 6 U («,x) 6 7; wz ( 7 c) c u 6 77 Z (D) C ] 

+ E n,W7VW + v/5SI + i ,84) 

1 1 («,y)er p wz ( 7p ) 

where A(j c , Pux) is defined in (T73T ). 

The proof of Theorem [19] is provided in Appendix |F] As with Theorem [141 the main novelty of our bound lies 
in the fact that the three error events lie under the same probability, making it amendable to treat all three error 
events jointly. The residual terms in (l84l (namely, the second, third and fourth terms) are relatively small with a 
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proper choice of constants 7 P ,7 C , $ > and L E N as we shall see in the sequel. We can again relax the somewhat 
cumbersome second and third terms in (l84l by noting the definition of 7p WZ (7 p ) and by going through the same 
steps to upper bound A; cf. d75l ). We thus obtain: 

Corollary 18 (Simplified CS-type bound for WZ coding). For arbitrary constants 7 P ,7 C > 0, and 5 > and 

positive integer L, there exists a WZ code with probability of excess distortion satisfying 

P„(#; D) < P UXY [(«, y) £ 7; WZ ( Tp ) c U («, x) € 7; WZ (7c) c U (», X, y) £ 7/ Z (D) c ] 

+ 2^M + /x + i - (85) 

To obtain achievable second-order coding rates for the WZ problem, we evaluate the bound in (f85T > for appropriate 
choices of 7 P ,7 C > 0, and 6 > and L G N in Section IVI-BI Since the lossy source coding problem is a special 
case of WZ coding, we use a specialization of the bound in (I85T ) to derive an achievable dispersion (or second-order 
coding rate) of lossy source coding 11271 . ll28l . which turns out to be tight. 



C. Novel Non-Asymptotic Achievability Bound for the GP Problem 

This section presents with a novel non-asymptotic achievability bound for the GP problem, which is the dual of 
the WZ problem ||49"1 . Our bound improves on both Theorems HT1 and [T2l and uses the same Channel-Simulation 
idea for the covering part. Recall the definitions of the sets 7^ GP (7 P ), 7^ GP (7c) and 7^ GP (r) in ((65]>, ([66]> and (l26l ) 
respectively. In the following, Pjjsxy £ &(U x S x X x y) satisfying (i) the 5-marginal of Pusxy is -Ps, (ii) 
Py\xS = W an ^ ( m ) U ~ S) — Y forms a Markov chain is fixed. 

Theorem 19 (CS-type bound for GP coding). For arbitrary constants 7 P ,7 C > 0, and 5 > and positive integer 
L, there exists a GP code $ with average error probability satisfying 

p c (<i>; r) < Pusxy [(«, y) s r p GP ( 7p ) c u (u, s ) € r GP ( 7c ) c u x e r GP (r) c ] 

+ L\M\ Pu(u)PY(y) + J A{lc L Pus) +6 (86) 

(n,y)er° p (7p) 

where A(j c , Pus) is defined in (l73l) . 

The proof of Theorem [19] is provided in Appendix |G] Notice that unlike the existing asymptotic and non- 
asymptotic results for GP coding (Theorems [TT] [12] and [13), the channel input x satisfies the cost constraint (|24] | 
or its almost sure equivalent (cf. Proposition [T). Direct application of (1751 ) to bound A(7 C , Pus) an d the definition 
of 7^ GP (7 P ) in (|65T ) yields the following: 

Corollary 20 (Simplified CS-type bound for GP coding). For arbitrary constants 7 P ,7 C > 0, and 5 > and 
positive integer L, there exists a GP code $ with average error probability satisfying 

P c ($ ; r) < Pusxy [(u,y) e r p GP ( 7p ) c u («, s) G r GP ( 7c ) c Ux € T GP (r) c ] + ^ + \j^ + 5 - ^ 

To obtain achievable second-order coding rates for the GP problem, we evaluate the bound in ([87]) for appropriate 
choices of 7 P , 7 C , 5 > and L G N in Section IVI-CI 



V. General Formulas 

In this section, we use the simplified CS-type bounds in Corollaries [151 [T8l and l20l to derive achievable general 
formulas for the optimal rate region of the WAK problem, the rate-distortion function of the WZ problem and 
the capacity of the GP problem. This allows us to recover known results in ll8l- |[T0l . By general formula, we 
mean that we consider sequences of these problems and do not place any underlying structure such as stationarity, 
memorylessness and ergodicity on the source and channel J7], 11221 . To state our results, let us first recall the 
following probabilistic limit operations. Their properties are similar to the limit superior and limit inferior for 
numerical sequences in mathematical analysis and are summarized in Q . 
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Definition 7. Let U := {U n }'^L 1 be a sequence of real-valued random variables. The limit superior in probability 
of U is defined as 

p-lim sup U n := inf (a el: lim Pi(U n > a) = o) . (88) 

77ie limit inferior in probability of U w defined as 

p-liminf U n := - p-lim sup (-U n ) (89) 

We also recall the following definitions from Han Q. These definitions play a prominent role in the rest of this 
section. 

Definition 8. Given a pair of stochastic processes (X, Y) = {X n ,Y n }^ = i with joint distributions {Px n ,Y n }ri=i> 
the spectral sup-mutual information rate is defined as 

1 P (Y n \X n ) 
7(X; Y) := p-linsup - log >- . (90) 

The spectral inf-mutual information rate J(X;Y) is defined as in (|8~8T ) wjY/i p-liminf in place 0/ p-lim sup. 77ie 
spectral sup- and inf-conditional mutual information rates are defined similarly. 
The spectral sup-conditional entropy rates is defined as 

H(Y\X) : = p-limsup - log- Ln • (91) 

The spectral inf-conditional entropy rates is defined as in ( f9Tb with p-lim inf in place of p-lim sup . 

A. General Formula for the WAK problem 

In this section, we consider sequences of the WAK problem indexed by the blocklength n where the sequence of 
source distributions {Px"Y n }^=i is general, i.e., we do not place any assumptions on the structure of the source 
such as stationarity, memorylessness and ergodicity. We aim to characterize an inner bound to the optimal rate 
region defined in (fT3T >. We show that our inner bound coincides with that derived by Miyake and Kanaya (U but 
is derived based on the upper bound on the error probability provided in our CS-type bound in Corollary [15] The 
choice of the parameters 7b , j c and 5 plays a crucial role and guides our choice of these parameters for second-order 
coding analysis in the following section. 

Let & ({Px"Y"}^ = i) be the set of all sequences of distributions {-Pf/"X™y™}^Li such that for every n > 1, 
jjn _ yn _ j^n f orms a Markov chain and the {X n x 3^ n )-marginal of P[jnx»Y n is Px™Y"- Define the set 

^wak : = U {(Ri,R 2 ) eM 2 + : R 1 >H(X\V),R 2 >1(V;Y)} (92) 

Theorem 21 (Inner Bound to the Optimal Rate Region for WAK El). We have 

^WAK C ^WAK- (93) 
We remark that by using techniques from [37], Miyake and Kanaya [8] showed that (|93l is in fact an equality, 
i.e., ^vak i s a l so an outer bound to ^wak- In addition, when the source distributions {Pjf»y»}™=i are stationary 
and memoryless (and the alphabets X and y are discrete and finite), ^yAK reduces to the single-letter region 
"^WAK defined in (l37l . This follows easily from the law of large numbers. The proof of Theorem |2T] follows 
directly from the finite blocklength bound in Corollary [15] In fact, the weaker bounds in Theorems [3] and 5] suffice 
for this purpose. 

Proof: Consider ([80]> and let us fix a process {Pu"X"Y"}^=i G ^{{Px^yA^=i) and a constant r] > 0. Set 

- log \M\ := H(X\U) + 2n (94) 

n 

(95) 

(96) 
(97) 
(98) 



-log|£| 
n 


:=/(U;Y) + 2r ? 


7b 


:= n(H(X\U) + rj) 


7c 


:= n(7(U;Y) +77) 


5 


:= 2'" 
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Then for blocklength n, the probability on the RHS of ( f8Qb can be written as 



1, 1 

n ° g P Xn \ Un (X"\U") 



>H(X\V)+ V \\J 



1 P Y n\un(Y n \U n ) _ 

-log ILEA 1 7 > J(U;Y)+n 



(99) 



By the definition of the spectral sup-entropy rate and the spectral sup-mutual information rate, the probabilities of 
both events in d99l tend to zero. Further, 



27b 

W\ 



-TIT] 




-nr)/2 



0. 



(100) 



Hence, P e ($ n ) — > 0. Since r\ > is arbitrary, from (l94l and (|95) we deduce that any pair of rates (i?i,i?2) 
satisfying Ri > iT(X|U) and R2 > J(U;Y) is achievable. ■ 



S. General Formula for the WZ problem 

In a similar way, we can recover the general formula for WZ coding derived by Iwata and Muramatsu [9 |. Note 
however, that we directly work with the probability of excess distortion, which is related to but different from the 
maximum-distortion criterion employed in ||9). Once again, we assume that the source is {Px"Y n }^Li is general 
in the sense explained in Section IV-AI 

Let <$^£>({Px"Y"}^Li) be the set of all sequences of distributions {Pjj^x^y^^Li and reproduction functions 
{g n :U n xy n ^ X n } such that for every n > 1, U n — X n — Y n forms a Markov chain, the (X n x 3^ n )-marginal 
of Pu^x n Y" is Px n Y n and 

p-limsup d n (X n ,g n (U n ,Y n )) < D (101) 

Define the rate-distortion function 

fl^(D):=inf{7(U;X)-J(U;Y)} (102) 

where the infimum is over all {Pu-x-Y™ , 9n}n=i G ^d{{Px^yAu=\)- 
Theorem 22 (Upper Bound to the Rate-Distortion Function for WZ [9 |). We have 

Rwz(D) < R^ Z (D). (103) 

Iwata and Muramatsu [9] showed in fact that (11031 ) is an equality by proving a converse along the lines of [37 |. 
It can be shown that the general rate-distortion function defined in (11021 ) reduces to the one derived by Wyner and 
Ziv (H in the case where the alphabets are finite and the source is stationary and memoryless. 

Proof: Let 77 > 0. We start from the bound on the probability of excess distortion in (l85l ). where we first consider 
D + 77 instead of D. Let us fix the sequence of distribution and the sequence of functions {{Pjj n X n Y n , 9n)}^Li £ 
&d({Px~y~}™ =1 ). Set 

1 



n 



log \M\ := I(U; X) - J(U; Y) + 4r/ 
1 



-logL:=/(U;X) + 2r/ 

n 

7p:=n(I(U;Y)-r/) 
7c :=n(T(U;X)+r ? ) 

and 5 as in (l98l) . Then, the probability in (l85l l for blocklength n can be written as 



(104) 

(105) 

(106) 
(107) 



Pu^X^Y" 



1 P Yn{Un (Y n \U n ) 



»7 



1 FXnl^n^l^ 



>I(U;X)+r ? 



{j! [ d n (X n ,g n (U\Y n ))>D + r 1 } 



(108) 
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By the definition of the spectral sup- and inf-mutual information rates and the distortion condition in (HOIK we 
observe that the probability in (I1081 l tends to zero as n grows. By a similar calculation as in (HOOK the other terms 
in d85l ) also tend to zero. Hence, the probability of excess distortion P e (Q n ;D + rj) — > as n grows. This holds 
for every rj > 0. By (11041 ). the any rate below /(U; X) — I(U; Y) + 4rj is achievable. In order to complete the 
proof, we choose a positive sequence satisfying rji > rj2 > • • • > and rjk — > as k — > 00. Then, by using the 
diagonal line argument [7, Thm. 1.8.2], we complete the proof of (1 103b - ■ 



C. General Formula for the GP problem 

We conclude this section by showing that the non-asymptotic bound on the average probability of error derived 
in Corollary |20]can be adapted to recover the general formula for the GP problem derived in Tan 1101 . Here, both 
the state distribution {Ps^ € ^ 2> (5 n )}^L 1 and the channel {W n : X n x S n — > y n }^ =1 are general. In particular, 
the only requirement on the stochastic mapping W n is that for every (x n , s n ) € X n x S n , 

W n (y n \x n ,s n ) = 1. (109) 

Let ^> r ({W n , P S ™}n=i) be the famil y of joint distributions P UnS n X n Y n such that for every n > 1, U n -(X n , S n )- 
Y n forms a Markov chain, the ^"-marginal of Pjj n s™X"Y™ is Ps n , the channel law P Y ^\x™,s n = ~W n and 

p-limsupg n (X n )<r (110) 

n— >oo 

Define the quantity 

C* GP (r) :=sup{/(U;Y)-7(U;S)} (111) 

where the supremum is over all joint distributions {Pu"S"X"Y^}^Li £ ^r({W n , Ps n }%Li)- 

Theorem 23 (Lower Bound to the GP capacity [ 10]). We have 

C G p(r) > C* GP (T). (112) 

Tan [10] also showed that the inequality in (|1 121) is, in fact, tight. When the channel and state are discrete, 
stationary and memoryless, Tan [10 1 showed that the general formula in (lllll l reduces to the conventional one 
derived by Gel'fand-Pinsker [5 | in (|63l . The proof of Theorem |23l parallels that for Theorem 1221 and thus, we only 
sketch the proof by providing the choices of \M\, L, j p and 7 C . 

Proof: Fix an rj > and a sequence of joint distributions {Pu^s^x^y^^i £ £Pr{{W n , Ps™}^=i)- Then 
make the following choices 

-logLMI :=/(U;Y) - 7(U; S) - 4t/ (113) 

n 

-logL :=7(U;S) + 2r/ (114) 

11 

7p := n(I(V;Y) - rj) (115) 
7c :=n(7(U;S) + 7 ? ). (116) 

Because (II 101) holds for the sequence of joint distributions {Pu"S"X"Y n }^Li, the average probability of error 
P e ($ n ; T + 7]) in (l87l) tends to zero and the rate of the code is given by (II 131 . The proof is completed by again 
appealing to the diagonal line argument [7 , Thm. 1.8.2]. ■ 

VI. Achievable Second-Order Coding Rates 

In this section, we demonstrate achievable second-order coding rates ifTTl . |[T2l . 11241 . PT1 . 11421 for the three 
side-information problems of interest. Essentially, we are interested in characterizing the (n, e)-optimal rate region 
for the WAK problem, the (n, e)-Wyner-Ziv rate-distortion function and the (n, e)-capacity of GP problem up 
to the second-order term. We do this by applying the multidimensional Berry -Esseen theorem ||23l , ||52l to the 
finite blocklength CS-type bounds in Corollaries Q3] [18] and |20l Throughout, we will not concern ourselves with 
optimizing the third-order terms. 

The following important definition will be used throughout this section. 
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Definition 9. Let k be a positive integer. Let V G M. kxk be a positive-semidefinite matrix that is not the all-zeros 
matrix but is allowed to be rank-deficient. Let the Gaussian random vector Z ~ A/"(0, V). Define the set 

y(V,e) : = {z G R k : Pr(Z < z) > 1 - e}. (117) 

This set was introduced in E51 and is, roughly speaking, the multidimensional analogue of the Q^ 1 function. 
Indeed, for k = 1 and any standard deviation a > 0, 

y(a 2 ,e) = [aQ-\e),oo). (118) 

Also, lfc and Ofc X fc denote the length-A; all-ones column vector and the k x k all-zeros matrix respectively. 



A. Achievable Second-Order Coding Rates for the WAK problem 

In this section, we derive an inner bound to ^wak i n i s) in (flTT) by the use of Gaussian approximations. Instead 
of simply applying the Berry-Esseen theorem to the information spectrum term within the simplified CS-type bound 
in (l80l ). we enlarge our inner bound by using a "time-sharing" variable T, which is independent of (X, Y). This 
technique was also used for the multiple access channel (MAC) by Huang and Moulin [45]. Note that in the finite 
blocklength setting, the region ^Vak(^> e) does not have to be convex unlike in the asymptotic case; cf. (|37l ). For 
fixed finite sets U and T, let &(Pxy) be the set of all Pjjtxy G ^(UxT xXxy) such that the X x ^-marginal 
of Pjjtxy is Pxy, U — (Y,T) — X forms a Markov chain and T is independent of (X, Y). 

Definition 10. The entropy-information density vector for the WAK problem for Pjjtxy £ &(Pxy) i s defined as 



mX,Y\T) :-- 



i 



l0g Px\ut(X\U,T) 
log P ^\ut\y\U,t) 



(119) 



Note that the mean of the entropy-information density vector in (11 191 ) is the vector of the entropy and mutual 
information, i.e., 

\H(X\U,T)] 



E[i(U,X,Y\T)]=J(P UTXY )- T{U . y[n 
The mutual information I(U;Y\T) = I(U,T;Y) because T and Y are independent. 



(120) 



Definition 11. The entropy-information dispersion matrix for the WAK problem for a fixed Pjjtxy £ -^{Pxy) is 
defined as 

V(Pjjtxy) := K r [Cov(j(tr, X, Y\T))] (121) 

= ^P T (t)Cov(j(C/,X,y|t)). (122) 

teT 

We abbreviate the deterministic quantities J(Pjjtxy) £ and V(P(yrxy) ^ as J and V respectively when 
the distribution Pjjtxy £ ^(Pjy) is obvious from the context. 

Definition 12. If V(Pjjtxy) ®2x2, define fflx^n, e; Pjjtxy) to be the set of rate pairs (J?i,i?2) such that 
R := [i?i,i?2] T satisfies 

R£j + 3^£) + ^ l2 . 023 , 



If V (Pjjtxy) = 02x2, define ffl\ n (n, e; Pjjtxy) to be the set of rate pairs (R\,R2) such that 

2 log n 

RGJ + — 1 2 . (124) 

n 



From the simplified CS-type bound for the WAK problem in Corollary [T5] we can derive the following: 

Theorem 24 (Inner Bound to (n, e)-Optimal Rate Region). For every < e < 1 and all n sufficiently large, the 
(n,e) -optimal rate region ^wak(tI)E) satisfies 

\J ^ in (n, e; Pjjtxy) cM WA K(n,e). (125) 
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From the modified CS-type bound for the WAK problem in Theorem [T6] we can derive the following: 

Theorem 25 (Modified Inner Bound to (n, e)-Optimal Rate Region). For every < e < 1 and all n sufficiently 
large, the (n,e) -optimal rate region &wAK.(n,£) satisfies 

\J <%[ n (n,e; P UT xy) cM WA K(n,e), (126) 

Putxy&&{Pxy) 

where & m (n, e; Putxy) is the set defined by replacing (1123b with 

R6U ( J + ^v.*> + [*-pF + ^ l2 l. (127) 

Remark 2. The bound in Theorem [25] is at least as tight as that in Theorem \24\ and the former is strictly tighter 
than the latter for a fixed test channel. However, it is not clear whether the improvement is strict or not when we 
take the union over the test channels. 

By setting T = Y = U = and R 2 = in Theorem 1250 we obtain a result first discovered by Strassen [41|. 

Corollary 26 (Achievable Second-Order Coding Rate for Lossless Source Coding). Define the second-order coding 
rate for lossless source coding to be 

a(Px,e) := limsup y/n(R x (n, e) - H(X)) (128) 

n— >oo 

where Rx{n,e) is the minimal rate of almost-lossless compression of source Px at blocklength n with error 
probability not exceeding e. Then, 

<j(Px,e) < VVar(logP x (X))Q- 1 (e). (129) 

It is well-known that the result in Corollary |26]is tight, i.e., A/Var(log Px(X))Q~ 1 (e) is the second-order coding 
rate for lossless source coding [QT|, BT1 . P2l . 

We refer to the reader to Appendix []]for the proof of Theorem |24] (Appendix [0 for the proof of Theorem l25l . 
The proof is based on the CS-type bound in (|80l ) and the non-i.i.d. version of the multidimesional Berry-Esseen 
theorem by Goetze [23 ]. The interpretation of this result is clear: From (I123t which is the non-degenerate case, we 
see that the second-order coding rate region for a fixed Putxy is represented by the set 5^ '(V (Putxy), £)/ '\fn. 
Thus, the (n, e)-optimal rate region converges to the asymptotic WAK region at a rate of 0(1/ y/n) which can 
be predicted by the central limit theorem. More importantly, because our finite blocklength bound in (l80l l treats 
both the covering and binning error events jointly, this results in the coupling of the second-order rates through the 
set y(V(PuTXY), e) and hence, the dispersion matrix V(Putxy)- This shows that the correlation between the 
entropy and information densities matters in the determination of the second-order coding rate. 

More specifically, Theorems l24l and 1251 are proved by taking P^y™ (u n \y n ) to be equal to P^| Ty (n n |t n , y") 

for some fixed (time-sharing) sequence t n £ T n and some joint distribution Putxy £ &{Pxy)- If 7" = 0, this 
is essentially using i.i.d. codes. An alternative to this proof strategy is to use conditionally constant composition 
codes as was done in Kelly-Wagner |[33l to prove the error exponent result in Theorem [5] The advantage of this 
strategy is that it may yield better dispersion matrices because the unconditional dispersion matrix always dominates 
the conditional dispersion matrix 641 Lemma 62] (in the partial order induced by semi-definiteness). For using 
conditionally constant composition codes, we fix a conditional type Vq y E Xi(W;Qy) for every marginal type 
Qy £ &n(y)- Then, codewords are generated uniformly at random from Tv Q (y n ) if y n e Tq y - However, it does 
not appear that this strategy yields improved second-order coding rates compared to using i.i.d. codes as given in 
Theorems [24] and |25] 

4 In fact, to be precise, we cannot derive Corollary [26] from Theorem [24] because there is the residual term 21 ° gn and we cannot set 
7?2 = 0. However, we can use Corollary [T5] with U — to obtain Corollary 1261 easily. 
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For comparison, for a fixed Pjjxy £ &{Pxy), define M^Jji, e; Puxy) to be the set of rate pairs that satisfy 

(130) 
(131) 



V re n 



n n 
for some A G [0, 1] where the marginal entropy and information dispersions are defined as 

V H (X\U) := Var ' ' 



logP xlu (X\U) 
VdU;Y) :=Var(tog py(y) j 



(132) 



(133) 



respectively. Note that if T = 0, then V^(X|C7) and Vj(f7 ; y) are the diagonal elements of the matrix V(Putxy) 
in (1121b . It can easily be seen that Verdu's bound on the error probability of the WAK problem (l42l yields the 
following inner bound on &wAK.( n i £ )- 



(134) 



\J ^y(n, e; Puxy) C ^wakO, e). 

This "splitting" technique of e into Ae and (1 — X)e in (11301 ) and (11311 ) was used by MolavianJazi and Laneman fl46ll 
in their work on finite blocklength analysis for the MAC. In Section |VlIl we numerically compare the inner bounds 
for the WAK problem provided in (1125) , (11261 ) and d!34t . 



Remark 3. From the non-asymptotic bound in Remark\T\ we can also show that 

(n,e) C ^wak (n,e), 
where & m (n, e) is the set of rate pairs (i?i,i?2) such that 



Ri 



H(X\Y) 
H(X,Y) 



, ^(V,e) 21ogn 
H F= 1 -L2 



/or the covariance matrix 



V = Cov 



logP X |y(X|y) 

log Pxy(X,Y) 



(135) 



(136) 



(137) 



B. Achievable Second-Order Coding Rates for the WZ problem 

In this section, we leverage on the simplified CS-type bound in Corollary [18] to derive an achievable second-order 
coding rate for the WZ problem. We do so by first finding an inner bound to the (n, e)-Wyner-Ziv rate-distortion 
region %z(n,e) defined in ( fT71 ). Subsequently we find an upper bound to the (re, e)-Wyner-Ziv rate-distortion 
function Rwz(n,e) defined in (l20l . We also show that the (direct part of the) dispersion of lossy source coding 
found by Ingber-Kochman [27] and Kostina-Verdii f28ll can be recovered from the CS-type bound in Corollary [18] 
This is not unexpected because the lossy source coding (rate -distortion) problem is a special case of the Wyner-Ziv 
problem where the side-information is absent. 

We will again employ the "time-sharing" strategy used in Section IVI-Ai Note again that in the finite-blocklength 
setting ^wz(",£) does not have to be convex, unlike in the asymptotic setting. For fixed finite sets U and T, 
let &{Pxy) be the collection of all pairs of joint distributions Putxy £ ^(W x T x A' x J) and functions 
g :U x y ^ X such that the X x ^-marginal of Putxy is Pxy, U — (X,T) — Y forms a Markov chain and T 
is independent of (X,Y). Note that g does not necessarily have to satisfy the distortion constraint in d47l ). 

In addition, to facilitate the time-sharing for distortion function, we define 



d(X,g(U,Y)\T) :=d(X T ,g(U T ,Y T )) 
where the random variables (Ut,X t ,Y t ) for any i G T have distribution PuxY\T=t- 



(138) 



24 



Definition 13. The infomiation-density-distortion vector for the WZ problem for (Putxy , g) £ &(Pxy) is defined 
as 



j(U,X,Y\T) :-- 



log 



Py\ut(Y\U,T)- 



log 



Px\ut{X\U,T) 



Px(X) 

ld(X,g(U,Y)\T) 



(139) 



Since ^ t P T (t)Ep UXYlT [d(X T , g(U T ,Y T ))\T = t] = E[d(X, g(U,Y)}, the expectation of information-density- 
distortion vector is given by 

-I(U-Y\T) - 

E[j(U,X,Y\T)] = 3(P UTXY ,g)= I(U;X\T) (140) 

_E[d(X, g(U,Y)}_ 

Observe that the sum of the first two components of (11401 ) resembles the Wyner-Ziv rate-distortion function defined 
in (|48T ). As such when stating an achievable (n, e)-Wyner-Ziv rate-distortion region, we project the first two terms 
onto an affine subspace representing their sum. See (11431 and (11441 ) below. 

Definition 14. The information-distortion dipersion matrix for the WZ problem for a fixed (Putxy , g) £ &(Pxy) 
is defined as 

V (P UTXY , g):=E T [Cov(i(U,X,Y\T))}. (141) 
Definition 15. Let M G M 2x3 be the matrix 



M := 



1 1 
1 



(142) 



If V(Putxy , g) 7^ 03x3, define &- m (n, e; Putxy > g) to be the set of all rate-distortion pairs (R,D) satisfying 

(143) 



.„ r . ... y(\,e) 21ogn 
G M ( J H v ; H — 1 3 



w/iere J := J(P[/TXY,fl l ) v : = V(P[/ T xy,ff). ifV(PuTXY,g) / 3x3 , tfe/me & in (n,e; Putxy, g) to 
be the set of all rate-distortion pairs (R, D) satisfying 



„ ,. , ... 21og7l 

GM| J + — — 1 3 



(144) 



In (1143b . the matrix M serves project the three-dimensional set J + .y(y,e)/y/n C M 3 onto two dimensions 
by linearly combining the first two mutual information terms to give I(U;X\T) — I(U;Y\T) = I(U; X\Y,T) (by 
the Markov chain U — (X, T) — Y). From the simplified CS-type bound for the WZ problem in Corollary [T8l and 
the multidimensional Berry-Esseen theorem 11231 . we can derive the following: 



Theorem 27 (Inner Bound to the (n, e)-Wyner-Ziv Rate-Distortion Region). For every < e < 1 and all n 

sufficiently large, the (n,e) -Wyner-Ziv rate-distortion region ^wzK^) satisfies 

[J & in (n, e; Putxy, g) C^ W z(n,e). (145) 

(Putxy, g)e^(P XY ) 

The proof of this result is provided in Appendix |K] Further projecting onto the first dimension (the rate) for a 
fixed distortion level D yields the following: 

Theorem 28 (Upper Bound to the (n, e) -Wyner-Ziv Rate-Distortion Function). For every < e < 1 and all n 

sufficiently large, the (n, e) -Wyner-Ziv rate-distortion function %z(ji,£,D) satisfies 



R wz (n,e,D) < inf { R : (R,D) G (J M in (n, e; P UT XY , g) } ■ (146) 

(Putxy, g)e,^(PxY) 

Theorems |27] and [28] are very similar in spirit to the result on the achievable second-order coding rate for the 
WAK problem. The marginal contributions from the distortion error event, the packing error event, the covering 
error event as well as their correlations are all involved in the dispersion matrix V(Putxy , g)- 
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It is natural to wonder whether we are able to recover the dispersion for lossy source coding 11271 , [28 1 as a 
special case of Theorem 1281 (like Corollary [26]is a special case of Theorem |25T). This does not seem straightforward 
because of the distortion error event in (|8"5V However, we can start from the CS-type bound in (l85l l, set Y = 0, 
U = X and use the method of types 11301 or the notion of the D-tilted information [28 1 to obtain the specialization 
for the direct part. Before stating the result, we define a few quantities. Let the rate-distortion function of the source 
X ~ Q G &>{X) be denoted as 



R(Q,D) :- 



min I(X;X), 

:P x =Q,Ed(X,X)<D 



where Kd(X,X) := Eij^iif^K 1 ^)' Also, define the D-tilted information to be 



j(x,D) :=-logE exp(\*D- \*d(x,X* 



(147) 



(148) 

where the expectation is with respect to the unconditional distribution of X* , the output distribution that optimizes 
the rate-distortion function in (|1471 i and 

X* :=--^R(Px,D). (149) 

Theorem 29 (Achievable Second-Order Coding Rate for Lossy Source Coding). Define the second-order coding 
rate for lossy source coding to be 



a(P x ,D, e) := limsup V^(R x (n, e; D) - R(P X , D)) 



(150) 



where R x (n, e;D) is the minimal rate of compression of source X ~ Px up to distortion D at blocklength n and 
probability of excess distortion not exceeding e. We have 



a(P x ,D,e) < x /Var(i(X,L>))Q- 1 ( £ ) 



(151) 



Two proofs of Theorem [29] are provided in Appendix O one based on the method of types and the other based 
on the D-tilted information in (11481 . For the former proof based on the method of types, we need to assume 
that Q i — y R(Q, D) is differentiable in a small neighborhood of Px and Px is supported on a finite set. For the 
second proof, X can be an abstract alphabet. Note that R(Px, D) = Ex~p x [j(X, D)]. We remark that for discrete 
memoryless sources, the D-tilted information j(x,D) coincides with the derivative of the rate-distortion function 
with respect to the source ||27| 

d 

R'(x,D) = ——R(Q,D) . (152) 



dQ(x) 



Q=P X 



C. Achievable Second-Order Coding Rates for the GP problem 

We conclude this section by stating and achievable second-order coding rate for the GP problem by presenting 
a lower bound to the (n, e, T)-capacity Cgp(u, e,T) defined in d34l . As in the previous two subsections, we 
start with definitions. For two finite sets U and T, define &{W,Ps) to be the collection of all Ptusxy £ 
3*{T xUxSxXxy) such that the 5-marginal of Ptusxy is Ps, Py\xs = W,U - (X, S,T) — Y forms a 
Markov chain and T is independent of (S, X, Y). Note that Ptusxy does not necessarily have to satisfy the cost 
constraint in (l62l . 

In addition, to facilitate the time-sharing for the cost function, we define 

g(X\T) := g(X T ) (153) 
where X t for any t 6 T has distribution P x \T=t- 

Definition 16. The information-density-cost vector for the GP problem for Ptusxy £ &(W,Ps) is defined as 

P YI ut(Y\U,T) 



j(U,S,X,Y\T) :-- 



log 
-log 



Py(Y) 
Psiut(S\U,T) 

Ps(S) 

S(X\T) 



(154) 
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Since J^t ^T(t)Ep xlT [g(Xx)\T = t] = E[g(X)], the expectation of this vector with respect to Ptusxy is the 
vector of mutual informations and the negative cost, i.e., 

I(U;Y\T) 

-I(U;S\T) . (155) 



,\j(U,S,X,Y\T)}= J (P TU sxy) 



Definition 17. The information-dispersion matrix for the GP problem for Ptusxy £ ^(W,Ps) is defined as 

V(Ptusxy) :=Et[Co^{U,S,X,Y\T))) (156) 



Definition 18. Let M be the matrix defined in (I142I ). If 'V(Ptusxy) 7^ O3X3, define the set &[ n (n,e; Ptusxy) to 
be the set of all rate-cost pairs (R, T) satisfying 



R 

-r 



n 



(157) 



where 3 := 3 (Ptusxy) and V := V(P T usxy)- Else ifV(PuTXY,g) / 3x3 , define ^info, £', Ptusxy) to be 
the set of all rate-cost pairs (R, T) satisfying 



R' 

-r 



2 log n 
G M ( J — 1 3 



(158) 



By leveraging on our finite blocklength CS-type bound for the GP problem in (1871 ). we obtain the following: 

Theorem 30 (Inner Bound to the (n,e)-GP Capacity-Cost Region). For every < e < 1 and all n sufficiently 
large, the (n,e)-GP capacity-cost region ^gp^^) satisfies 



(J & in (n,e; Ptusxy) C tf G p(n,e). 

PTUSX Y e^(W,P S ) 

By projecting onto the first dimension (the rate) for a fixed cost T > 0, we obtain: 



(159) 



Theorem 31 (Lower Bound to the (n, e)-GP Capacity). For every < e < 1 and all n sufficiently large, the 
(n,e)-GP capacity-cost function C(j,p(n,e,T) satisfies 



C GP (n,e,T) > sup < 



R:(R,T)€ (J M in (n, e; Ptusxy) > (160) 

Ptusxy£&{W,P s ) 

The proof of Theorem |30l can be found in Appendix IM1 The matrix M serves to project the first two components 
of each element in the set J + <!/ (V , e) / yfn onto one dimension. Indeed, for a fixed Ptusxy £ &(W,Ps), the 
first two components read I(U;Y\T) - I(U;S\T) which, if T = and the random variables (U,S,X,Y) are 
capacity-achieving, reduces to the GP formula in (l63l . Hence, the set M.y(V, e)/y/n C R quantifies all possible 
backoffs from the asymptotic GP capacity-cost region ^gp (defined in d33l) ) at blocklength n and average error 
probability e based on our CS-type finite blocklength bound for the GP problem in (l87l . The bound in (1160b is 
clearly much tighter than the one provided in [ 10 1 which is based on the use of Wyner's PBL and Markov lemma. 

Now by setting S = T = $,U = X and V = 00 in Theorem |3T] we recover the direct part of the second-order 
coding rate for channel coding without cost constraints ITT21 . 11241 . BTL 

Corollary 32 (Achievable Second-Order Coding Rate for Channel Coding). Fix a non-exotic discrete memo- 
ryless channel W : X —> y with channel capacity C(W) = maxp x I(X; Y). Define the second-order coding rate 
for channel coding to be 

a(W,e) := lim sup y/n(C(W) - C w (n,e)) (161) 

where Cw(n,e) is the maximal rate of transmission over the channel W at blocklength n and average error 
probability e. Then, 



W(Y*\X* S 



Q~\e) 



(162) 
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where (X* ,Y*) ~ Px* x W and the minimization is over all capacity-achieving input distributions. 

The bound in (11621 ) is has long been known to be an equality PTL Note that the unconditional dispersion in (11621 ) 
Var {log W pJ,(y*) ^) coincides with the conditional dispersion ll24l since it is being evaluated at a capacity-achieving 
input distribution. As such, the converse can be proved using the meta-converse in [24] or an modification of 
the Verdu-Han converse Lem. 3.2.2] with an judiciously chosen output distribution as was done in fl2"l . In 
fact, we can also derive a generalization of Corollary [32] with cost constraints incorporated |TT2] Thm. 3] using 
similar techniques as in the proof of Theorem |29] Namely, we use a uniform distribution over a particular type 
class (constant composition codes) as the input distribution. The type is chosen to be close to the optimal input 
distribution (assuming it is unique). 

VII. Numerical Examples 

A. Numerical Example for WAK Problem 

In this section, we use an example to illustrate the inner bound on (n, e)-optimal rate region for the WAK problem 
obtained in Theorem |24l We neglect the small O ( ^fp ) term. The source is taken to be a discrete symmetric binary 

(163) 



source DSBS(a), i.e., 

1 — a a 
a 1 — a 



PXY = \ 



In this case, the optimal rate region reduces to 



^WAK 



(Ri,R 2 ) : Ri > h(/3* a), R 2 >l-h(/3), < /3 < i J> . (164) 



where h(-) is the binary entropy function and /3 * a := f3(l — a) + (1 — (3)a is the binary convolution. The above 
region is attained by setting the backward test channel from U to Y to be a BSC with some crossover probability 
(3. All the elements in the entropy-information dispersion matrix V(/3) can be evaluated in closed form in terms 
of /3. Define J(/3) := [h({3 * a), 1 — h(/3)] T . In Fig. 01 we plot the second-order region 

m a (n,s):= |J f.(R u R 2 ):R£j(l3) + ^ V (P ),£) \. (165) 

0</3<| ^ V J 

The first-order region ^yak an ^ tne second-order region with simple time-sharing (|7~| =2) are also shown for 
comparison. More precisely, the simple time-sharing is between /3 = and ft = 1/2. As expected, as the block 
length increases, the (n, e)-optimal rate region tends to the first-order one. Interestingly, at small block length, time- 
sharing makes the second-order (n, e)-optimal rate region in (11651 ) larger compared to that without time-sharing. 
Especially, the simple time-sharing is better than ^ n (n, e) for n = 500 because the rank of the entropy-information 
dispersion matrix AV(0) + (1 - A)V(l/2) for < A < 1 is oneJl 

We also consider the region ^^(r^e) which is the analogue of &i n (n, e) but derived from Verdu's bound in 
Theorem |U In Fig. [51 we compare the second-order coefficients, namely that derived from our bound o5^(V(/3), e) 
and 

y y (V(P),e) := |J {(z 1 ,z 2 ):z l >Vv^Q-\\e),z 2 >Vv I T^)Q- 1 ((l-\)e)}. (166) 
0<A<1 

Note that the difference between the two regions is quite small even for e = 0.5. This is because, for this example, 
the covariance of the entropy- and information-density (off-diagonal in the dispersion matrix) is negative so the 
difference between Pr(Zi > z\ or Z 2 > z 2 ) and Pr(Zi > z\) + Pr(Z 2 > z 2 ) is small. In this case, the 2- 
dimensional Gaussian Z ~ A/"(0, V(/3)) has a negative covariance and hence the probability mass in the first and 
third quadrants are small. Hence, the union bound is not very loose in this case. 

Next, we consider the binary joint source given by Px|y(l|0) = Px|y(0|l) = a and -Py(O) = p < |, which is 
a generalization of (I1631 ). This example was investigated in [53], and the optimal rate region reduces to 

•^WAK = {(Ri,R2) : Ri > KP * a), R 2 > h(p) - h(J3), < /3 < p} . (167) 



It should be noted that the rank of V(l/2) is zero. 



2S 




Fig. 4. A comparison between &i n (n,e) without time-sharing (solid line) and the time-sharing region (dashed line) for e = 0.1. The 
regions are to the top right of the curves. The blue and red curves are for n — 500 and n = 10, 000 respectively. The black curve is the 
first-order region ([T). 



The above region is attained by setting the backward test channel from U to Y to be BSC with some crossover 
probability < (3 < p. All the elements in the entropy-information dispersion matrix V(/3) can be evaluated in 
closed form in terms of f3. Define J(/3) := [h(/3 * a), h(p) — h((3)] T . In Fig. [6l we plot the second-order region 

m n (n, e):= \J j (R u R 2 ) : R G J(/3) + ^Miflj . (168) 

For comparison, we also plot the second-order region derived from Remark [3] Around the corner point defined by 
the entropies [H(X\Y), H(Y)] T = [h(f3) , h(p)] T , we find that the bound from Remarkets tighter than that given 
by (fl68T) . 



B. Numerical Example for GP Problem 

In this section, we use an example to illustrate the inner bound on (n, e)-optimal rate for the GP problem obtained 
in Theorem l30l We do not consider cost constraints here, i.e., V = oo. We also neglect the small O {^p^ term. 
We consider the memory with stuck-at faults example [54] (see also H] Example 7.3]). The state 5 = correspond 
to a faculty memory cell that output independent of the input value, the state 5 = 1 corresponds to a faculty 
memory cell that outputs 1 independent of the input value, and the state 5 = 2 corresponds to a binary symmetric 
channel with crossover probability a. The probabilities of these states are |, |, and 1 — p respectively. 

It is known ll54l that the capacity is 

C* GP = (l-p)(l-h(a)). (169) 

The above capacity is attained by setting U = {0, 1} and Pu\x{0\0) = = 1 — a, Pu\s{u\2) = \, and 

X = U. All the elements in the information dispersion matrix V can be evaluated in closed form. In Fig. [71 we 




Fig. 6. A comparison between ,^i n (n, e) (red solid curve) and the bound from Remark [3] (blue solid curve) for e = 0.1 and n = 1000. 
The regions are to the top right of the curves. 
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plot the second-order capacity 



R G p(n,£;p,a) := (1 - p)(l - h(a)) - 




mm{z 1 + z 2 :(z 1 ,z 2 )eS e (V,£)}. (170) 



For comparison, let us consider the case in which the decoder, instead of the encoder, can access the state S. In 
this case, we can regard X as the channel input and (S, Y) as the channel output. It is known ll54l that the capacity 
C(W) of this channel is the same as (1 1 69b - The dispersion V can be evaluated in closed form by appealing to the 
law of total variance [55]. In Fig. |7J we also plot the second order capacity 



C(n, s;p, a) := (1 — p)(l — h(a)) — 




(171) 



From the figure, we can find that the lower bound Rqp(ti, e;p, a) on the GP (n, e)-optimal rate is smaller than 
the (n, e)-optimal rate with decoder side-information though the first order rates coincide. 



VIII. Conclusion and Further Work 

In this paper, we proved several new non-asymptotic bounds on the error probability for side-information coding 
problems, including the WAK, WZ and GP problems. These bounds then yield known general formulas as simple 
corollaries. In addition, we used these bounds to provide achievable second-order coding rates for these three side- 
information problems. We argued that when evaluated using i.i.d. test channels, the second-order rates resulting 
from our non-asymptotic bounds are the best known in the literature. In particular, they improve on Verdu's work 
on non-asymptotic achievability bounds for multi-user information theory problems [6 1. Other challenging problems 
along the same lines include the Heegard-Berger ll56l problem, multiple description coding 11571 . Marton's inner 
bound for the broadcast channel 11581 , 11591 , compress-forward for the relay channel [60 1 and hypothesis testing with 
multi-terminal data compression lloTl . 
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Appendix A 
Proof of Proposition [1] (Expurgated Code) 

Proof: Let xq G X be a prescribed constant satisfying g(xo) < T, and let P x be the distribution such that 
P x (xq) = 1, i.e., P x (x) = l[x = xo]. Then, we define 

P X \Ms(x\m,s) := P xlMS (x\m,s)l [g(x) < r] + P X \MS (T g GP (r) c |m, s) P* x (x). (172) 

Then, it is obvious that P x (7^ GP (r)) = 1. We also have 

= E S Pw H^sW^x:|Afs(a:k,*)W r (l/k,a)-PA[y(^ll/) (173) 

m,rh s,x,y 



= ^PM(m)P s (s)P xlMS (x\m, S )W(y\x,s)P AilY (m\y)l[g(x) <T] 

m,rh s,x,y 

+ E E PM(m)P s (s)P x \MS (T g GP (r) c |m, s) P x (x)W(y\x, s)P^ Y (m\y) (174) 
< E PM(m)P s (s)P x]MS (x\m,s)W(y\x,s)P KilY (m\y)l[g(x) <T] 

m,m s,x,y 

+ E E PM{m)P s (s)P x \MS (T g GP (r) c |m, s) P x (x)W(y\x, s)P^ Y (m\y) (175) 

m,m s,x,y 

= E Y P M^ P s( s ) P x\Ms(x\m,s)W(y\x,s)P £llY (m\y)l[g(x)< 



m,m s,x,y 



+ PM(m)P s (s)P x]MS (7 g GP (r) c |m, s) (176) 

P msxym W> < r n m / m] + P A/5xyJ v f [g(x) > T] (177) 
^MSXYM is(x) >rura/m] (178) 



as desired. 



Appendix B 
Channel Resolvability 

In this appendix, we review notations and known results for channel resolvability [7 Ch. 6] |[T3l [14]. 

As a start, we first review the properties of the variational distance. Let &'(U) be the set of all sub-normalized 
non-negative functions (not necessarily probability distribution unless otherwise stated) on a finite set U. Note that 
if P G ^'{U) is normalized then P G &{U), i.e., P is a distribution on U. For P,Q £ 0"(U), we define the 
variational distance (divided by 2) as 

d(P,Q) = \Y.\ P{ ~ u )-^ u )\- < 179 ) 

For two sets U and Z, let &'(Z\U) be the set of all sub-normalized non-negative functions indexed by u G U. 
When W G &'{Z\U) is normalized, it is a channel. In this section, we denote the joint distribution induced by 
P G ^(U) and W G &'(Z\U) as PW G ^'{U x Z). The following properties are useful in the proof of theorems. 

Lemma 33. The variational distance satisfies the following properties. 

1) The monotonicity with respect to marginaliz.ation: For P,Q G &'(U) and W, V G {Z\IA), let P',Q' G 
0»(Z) be 

p'(z) ■= ^ P(u)W(z\u), Q'(z) := Y Q(u)V(z\u). (180) 
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Then, 

d(P',Q') <d(PW,QV). (181) 

2) The data-processing inequality: For P,Q £ &'(IA) and W G ^'{Z\tl), 

d{PW, QW) < d(P, Q). (182) 

3) For a distribution P G £P(U), a sub-normalized measure Q G £P'(U), and any subset T C U, 

P(T) < Q(T) + d(P, Q) + 1 -Q( U \ (183) 

Remark 4. Combining ( 11811 ) a«<i ( 11821 ), we &ave 

d(P',Q') <d(P,Q). (184) 

Although the above inequality is usually referred as the data-processing inequality, we will use (11821 ) in the proofs 
of non-asymptotic bounds. 

Proof: Since 

d ( p '> of) = \ E E - E " : 

2 



(185) 



^ ? E El P <»^l u ) - Q(«M*MI (186) 



2 

2 It 



= d(PW,Qy) (187) 
holds, we have (1181b - On the other hand, we have 

d{PW, QW) = - Y)P(u)W(z\u) - Q{u)W(z\u)\ (188) 



2 



2 

u,z 



<^Ei p w-^)i 



(190) 



2 

= d(P,Q) (191) 

and thus, (11821) holds. 

Further, letting Up = {u : -P(u) > Q(u)} and (7 = 1 — QiU), we have 

d ( p >Q) = ^Ei p ( u )-^)i < 192 > 

= l \{P{Up) - Q(Up)} + {Q(up) - P(U c p)}] (193) 

= ~ [{P{U P ) - Q(U P )} + {1 - P{U C P ) - (Q(U) - Q{U c p)) - q}\ (194) 

= P{U P ) - Q{U P ) - | (195) 

> P(T) - Q(T) - |. (196) 

Hence, we have d 1 83b - ■ 
Next, we introduce the concept of smoothing of a distribution ll6*2l . For a distribution P G ^{U) and a subset 
T C U, a smoothed sub-normalized function P of P is derived by 

P{u) := P(u)l[u G T\. (197) 

Note that the distance between the original distribution and a smoothed one is 

d{P,P) = ^-l. (198) 
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Similarly, for a channel W :U — » Z and a subset T C U x Z, a smoothed one G is derived by 

W(z\u) := z) G T] (199) 

and it satisfies 

d{PW, PW) = ± — - , (200) 

where PW € ^(W x 2) is the joint distribution induced by P and W. 

Now, we consider the problem of channel resolvability. Let a channel P z \u :U ^ Z and an input distribution 
Pu be given. We would like to approximate the output distribution 

P z {z) = Y J Pu{u)Pz\u{z\y) (201) 

ueu 

by using Pz\u and as small an amount of randomness as possible. This is done by means of a designing a 
deterministic map from a finite set X to a codebook C = {ui}i e x C U. For a given resolvability code C, let 

P z {z) = £ ^P Z \u(z\ui) (202) 

be the simulated output distribution. The approximation error is evaluated by the distance d(P z ,P z ). 

We consider using the random coding technique as follows. We randomly and independently generate codewords 
ui,U2, ■ ■ ■ , u\x\ according to Pjj. To derive an upper bound on the averaged approximation error Ec [d(P z , Pz)] , 
it is convenient to consider a smoothing operation defined as follows. For the set 

r p Z \u(z\u) 1 

T c (7c) := (u, z) : log ^ < 7c , (203) 



Pz(z) 
let 

P z]u (z\u) := P zlu (z\u)l[(u, z) G T c (7c)]. (204) 
Moreover, for fixed resolvability code C = {u\, . . . , u\x\}, let 



iei 1 1 

Then, we have the following lemma. 

Lemma 34 (Lemma 2 of lfl4l ). For any j c > 0, we have 



(205) 



2 

and 



E C [d(P z ,Pz)]<U Ailc : T Puz) (206) 



E C [d(P z ,P z )] < P UZ (T c (7c) c ) + A ilc ^ UZ \ (207) 
where P z (z) = Y, u P u(u)P z \u( z \u)- 

Remark 5. Although the statement of KT4\ Lemma 2 ] is that ( 12071 ) holds, ( 12061 ) is also proved in the proof of ft!4\ 
Lemma 2] (at left bottom of KT4\ pp. 1566]). 

The definition of lx(y) in the proof of lfl4l Lemma 2] is incorrect. Hence, for completeness, we provide the 
proof of (12061 l in Lemma l34l 

Proof: By Jensen's inequality and the convexity of t \— > t 2 , 

E c [d(P z ,Pz)] 2 < E c [d{P z ,P z f] = V [\\P Z - P z \\l) , (208) 
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where ||a:||i := \ x i\ * s trie l\ norm. We now bound the term on the right as follows: 

Ec [UP? - P Z \\1] 



Er 



Er 



(y2\Pz(z) - p z (z)\\ 



P,(z)-P z (z) 



Pz(z) 



< Er 



P z (z)-P z (z) 



Pz(z) 



E, 



C,Pz 



Py.-Pz 



Pz 



(209) 
(210) 

(211) 
(212) 



where (121 II ) follows from Cauchy-Schwarz inequality regarding a z := y/P z {z) as one -dimensional vector 
- - as another \Z -dimensional vector. In fact, \Z\ does not have to be finite for the 



and b z := ^fPzJz) 
Cauchy-Schwarz inequality to be valid. Continuing, we have 



Pz{z) 



Ee [\\Pz - PzWl) 



< Ep z E c 
= E Pz E c 
= E Pz E c 



PM Pz(z) 



Pz(z) Pz(z) 



E 



1 Pz\u( z Wi) Pz{< 



\X\ P z (z) P z (z) 



—y 

IIP 



iex 



Pz\u(z\uj) 
Pz(z) 



|X| 2 P Z {Z) P Z {Z) 



E P — E Pu 



1 

W\ 



< —E Pz E Pu 




2 P z \u(z\ui) Pz{z) fP z ( 



+ 



X\ P z (z) P z (z) \P z {z 



Pz(z) 



Pz(z) 



i ^ Pu( u )p zlu (z\ u y 

m E 1 K u > z ) T ^ 



u,z 



m E 



(n, Z )eT c ( 7 c) 

-1a( 7c , i^z) 



where (12161 ) follows from the fact that for i / j, 



E r 



fz(*) Pz{z) 



Pz(z) 



Pz{z) 



Er 



Pz\u(z\uj) 
Pz{z) 



Er 



p z\u(z\uj) 
Pz(z) 



Pz{z) 
Pz(z) 



(213) 
(214) 



(215) 
(216) 

(217) 
(218) 
(219) 
(220) 

(221) 
(222) 
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and ( 12191 ) follows from the definition of Pz\u( z \ u ) m (12041 ) . Uniting d208b and d220b completes the proof of 
Lemma [34] ■ 
We can relax (I207t as 

1 r2> 

E C [d(P 2 ,P Z )] < Puz (r c ( 7c ) c ) + 2 W j^| (223) 

by upper bounding A(7 C , Puz)', cf. (f75l ). 

In some cases, we need to consider the noiseless channel, i.e., Z = U and W(z|it) = l[u = z], and want to 
approximate Pjj by 

^W = 53m 1 ^ = n J- (224) 



iex 1 1 



In this case, the bound (12231 ) reduces to 



1 /27c 

e c [d(p &J iV)] < P/ [-iogiV(«) > 7c] + 2 y "m"- (225) 



Appendix C 
Simulation of Test Channel 

In this appendix, we develop two lemmas which form crucial components of the proof of all CS-type bounds. 
To do this, we consider the problem of channel simulation llT5l - lfT9"1 . Roughly speaking, the problem is described 
as follows. Let a joint distribution Pjjz onWxZ given. An observer (encoder) of Z describes to a distant random 
number generator (decoder) that produces U so that simulated channel P(j\ Z is statistically indistinguishable from 
Pu\z- We assume that the encoder and decoder have common randomness. 

In the following, we construct a pair of encoder and decoder for this problem. More precisely, we construct a 
stochastic map from K x Z to C and a map from K, x C to U, where K, is the alphabet of the common randomness 
and £ is a message index set. We will use notations introduced in Appendix |Bj 

To construct a stochastic map from K x Z to £, we first consider the channel resolvability code as follows. 
Let us generate a codebook C = {itn, . . . ,«nc||£|}> where each codeword u k i is randomly and independently 
generated from Pjj, which is the marginal of Pjjz- Let K and L be the uniform random numbers on K, and C 
respectively. Moreover, let Pz\u be a smoothed version of Pz\u defined in (1204b . Then, C, K, L, and Pz\u induce 
the sub-normalized measure 

Pkluz^i^z) '■= Tjqr^Pz\u(z\u)l[u k i =u\. (226) 
Marginals P UZ \ K , Plz\k> anc ^ p z\k °^ ^kluz wq a ^ so induced as 

Puz\k( u > z \ k )=Yl Tr\Pz\u(z\u)l[u M = u] (227) 
P LZ[K (l, z\k) = Y^ T^Pz\u(z\u)l[u kl = u] (228) 



and 



P z\ K ( z \ k ) = E ^iPz\u(z\u)l[u kl = u]. (229) 

u,l 

Now, we define a stochastic map ipc'. JC x Z -4 C a^| 

1 

~P~z\M k ) ' 



Ml\k,z) = % 2 ^\ (230) 



6 When Pz\ K (z\k) = 0, we define tpc(l\k, z) arbitrarily. 
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By using C and ipc, we can simulate the channel Pjj\z as follows. The observer of Z (encoder) sends the realization 
/ G C of the random experiment ipc( • \k, z) if the common randomness is k and Z = z. Receiving the index I G C 
from the observer and the common randomness k € fZ, the random number generator (decoder) outputs u^i € C. 

Let L be the output of the stochastic map tpc for the inputs K and Z. Then, the output U of the decoder is 
U = u K ^ and the joint distribution of K, L, U, Z is given by 

1 

It should be noted here that the conditional distribution Pjy\ KZ induced by P K i(j Z satisfies 



P KLUz( k > l > u > z ) = — p z(z)ip c {l\k,z)l[u M = u}. (231) 



P \JZ\K^"> Z \ k ) V"^ 

p u\Kz( u \ k > z ) = S ( m = >,¥>c(J|M)l["Jfef = u]. (232) 



We also introduce a smoothed version of P K ~ LU z as follows: 



P KLUz( k > l > u > z ) = TTT-fzWvc^lM)!^/ = «] (233) 



1 

where P z is the marginal of P uz := PuPz\u', i- e - p z{z) := Yj U P u{ u ) p Z\u{z\u). 

Now, we prove two lemmas which can be used to evaluate the performance of the channel simulation model 
described above. 



Lemma 35. We have 

Puz{(u,z)_iTM) 
2 



d(P uz , Puz) < + d(P uz , Puz) (234) 



where P uz ( resp. P uz ) is the marginal of P K i uz ( resp. P K iuz^ 
Proof: By the triangular inequality, we have 

d(P uz ,Puz)<d(P uz ,P uz ) + d(P uz ,Puz). (235) 
Further, we can bound the first term of the right hand side of the above inequality as 

d( p uz, p uz) = d(PzP ulz , PzPu\z) (236) 

<d(P z ,P z ) (237) 

<d(P uz ,Puz) (238) 

= Puz((u,z)eT c (^) c ) (239) 

where (1237b follows from the fact that 

p u\z( u \ z ) = p u\z( u \ z ) = /Z pq^^l*' z ^ U/d = n ] (240) 

and the data-processing inequality (I1821 i. (12381 ) follows from the monotonicity property in (11811 . and (|2391 l follows 

from (|2QQT >. ■ 

Lemma 36. For every 7 > 0, we have 



E c [d(P uz ,Puz)] < \ ^ {lC \^ Z) + \\j j^pf + p u [-bg W > 7] • (241) 
Proof: We have 

d(P uz , Puz) < d(P uz , P uz ) + d(P uz , Puz) (242) 

< d(P KUZ , P KUZ ) + d(P uz , Puz) (243) 

< d(P KUZ ,P KUZ ) + d(P u ,P u ) (244) 
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where (12421 ) follows from the triangle inequality, (1243 l l follows from the monotonicity (11811 ), and (12441 ) follows 
from the data-processing inequality (11821 ) and the fact that, by the definition of P KL jj z , 

Pz\v{z\u) = P z \u(z\u). (245) 

We bound the first term in (I2441 i as follows: 

d ( P KUZ> P kuz) = d{P K PzPu\ KZ , P Kt j Z ) (246) 

= d(P K PzP 0lKZ ,PKP 0zlK ) (247) 

= d(P K PzPu ]KZ , PkP z \ k P(j\ kz ) (248) 

<d{P K P z ,P K P m ) (249) 



<EiF^(')-V'i^ (250) 



k&K 

where (I248t follows from the first equality in (I232K and (12491 ) follows from the data-processing inequality (11821 ) 
and the fact that 

P U\Kz( u \ k i z ) = P u\Kz( u \ k > z ) = z2 Pe( l \ k > z^Ium = u]. (251) 

l 

Taking the expectation with respect to the codebook C yields 



E C [d(P K(jz , P K02 )] < £ ^fc [d(Pz( ■ ), P Z[K ( ■ \k)) 



(252) 



< U^J^ 1 - (253) 



where the second inequality is obtained by using (12061 ) for each k € fZ. 
Further, we have 



1 / 2^ 

E c [d{P ,Pu)] < P v \-\ogPuiu) > 7 ] + 2 V IjciTci- (254) 



by (2551 . 

Combining (|244l) . (12531 ). and (12541 ) completes the proof of the lemma. 



Appendix D 

Proof of the First Non-Asymptotic Bound for WAK in Theorem [T4l 
A. Code Construction 

We construct a WAK code by using the common randomness for the helper and the decoder. To do this, we use 
the stochastic map introduced in Appendix O Let /C be the alphabet of the common randomness for the helper 
and the decoder. Further, let Z = y and Z = Y, that is, let Pjj Z = Pjjy, where Pjjy is the marginal of the given 
distribution Pjjxy £ &{Pxy)- It should be noted here that, in this case, T c {^ c ) defined in (12031 ) is equivalent to 
7^ WAK (7c) defined in (l40l . Now, let us consider the stochastic map fc defined in (1230I ). 

By using ipc, we construct a WAK code <3? as follows. The main encoder uses a random bin coding / : X — > M.. 
The helper uses the stochastic map ipc : K, x y — > C. That is, when the side information is y € y and the common 
randomness is k G /C, the helper generates I G C according to tpc(- \ k,y) and sends I to the decoder. For given 
m G A4, I G C, and common randomness k G K,, the decoder outputs the unique x G X such that f(x) = m and 

(^,x)G7; WAK (7b). (255) 
If no such unique x exists, or if there is more than one such x, then a decoding error is declared. 
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B. Analysis of Error Probability 

Let L be the random index chosen by the helper via the stochastic map <pc(' \K,Y), and let U = u K ~ L . Note 
that the joint distribution of K,L,U,Y is given as follows; cf. (12311 ) 

P KLUY^ k ^ l ^ u ^y) = Jjq P Y(y)<fc(l\k,y)1-[uki =u] (256) 

and then, the joint distribution of K, L,U,Y and X is given as 

P KWXY^ k ^^ U ^ X ^y) = P KLUY^ k ^ l ^ U ^y) P X\Y{x\y)- ( 257 ) 

The smoothed versions P K i(j Y ar *d p kluxy we §i ven by substituting Py in (12561 ) with P Y ; cf. (1233b - 
If the decoding error occurs, at least one of the following events occurs: 

£i:={K/,*)^7; WAK (7b)} 

£ 2 := {3x + x s.t. /(z) = f(x),(u kl ,x) G 7; WAK ( 7 b)} 

Hence, the error probability averaged over random coding /, the random codebook C, and the common randomness 
K can be bounded as 

E f E c E K [P c m = [P kWxy (£i U £ 2 )] . (258) 

Let 

En ■= {(u,x) : (u,x) i T b WAK ( 7 b) or 3x / x s.t. f(x) = f(x),{u,x) G T b ( 7b )} • (259) 
Then, for fixed / and C, we have 

p kluxy( £iU£ z) 

= ^2 P K ifj XY (k,l,u,x,y)l[(u,x) e Sx 2 ] (260) 

k,l,u,x,y 

= P KLUY( k > l > u >y) P X\Y( x \y) 1 K u > x ) £ £ 12] (261) 

k,l,u,x,y 

= E p uy(u,V)Px\y(x\v)1[(u,x) G £12] (262) 

u,x,y 

= P 0xY ((u,x)e£ 12 ) (263) 

1 - P UXY (W x A- x y) 
2 

PuxY((u,y)^T™ AK ( lc )) 
2 

W((«,x) G £ 12 ) + PuY ^ U ^ ^ WAK (7c)) +d(Pf . y p x|y; p c/y p x|y) (266) 

Pc/y((n,y)^7; WAK (7c)) 
2 



< G £ 12 ) + ^l +d ( P(jxY ,p uxY ) (264) 

iW ((«, z) G S l2 ) + * ' c ^ + d(P^ xy , P[/xr) (265) 



< Puxy((u,x) G £ 12 ) + ii^^qp W£ZZ + d (P c)y , J p l/y) (267) 

< iW(( U) z) G fi2) + iVy((«,y) g 7; WAK ( 7c )) + d(p£ y ,iW) (268) 

< Puxy((u,x) i 7; WAK ( 7b )) + iW[35 ^ z s.t. /(z) = /(z), (u,z) G T b ( 7b )] 

+ Puy((u, y) i 7; WAK ( 7c )) + d(P &Y , P UY ) (269) 
= P^xy((«,x) i 7; WAK ( 7b ) n (u,y) G 7; WAK ( 7c )) + Puy{{u, V ) t T™ AK { lc )) + d^Puy) 

+ P UXY [3x^ x s.t. /(z) = /(z)>,z) G T b (7b)] (270) 
= Puxy((u,z) £ 7; WAK ( 7b ) U (u,y) i 7; WAK ( 7 c)) + d(P &Y ,P UY ) 

+ P UXY [3x^x s.t. /(z) = /(z)>,z) G7b(7b)] (271) 
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where (12641 ) follows from (1183K (|2671 i follows from the data-processing inequality (1182K and (12681 ) follows from 
Lemma [35] By taking average over C, the second term in (127 11 1 is upper bounded 



E C [d(P &Y ,P UY )} < l\j A{l ]^ UY) + Wj^c| + hlogPaW > 7] (272) 
by using Lemma [36] On the other hand, by taking average over /, the last term in ( 12711 ) is upper bounded as 
E/ / x s.t. f{x) = f(x),(u,x) G T b (7b)]] 

< E = /0e)]]1[(u,5) G 7; WAK (7b)] (273) 

' ' u x 
1 1 («,x)GT b WAK ( 7b ) 

where we used the fact Puxy(u, %, y) < Pu( u ) in (1274b . Hence, by (127 lb . (12721 ). and (1275b . we have 

E f E c E K \P c ($)] = E/E c [i^^xyCfi U £ 2 )] (276) 
< ^xy(( n,x)g^ WAK ( 7b ) U (u,y) £ T c WAK ( 7 c)) 

1 1 («,£)er b WAK ( 7 b) 
Since we can choose 7 > and \K\ arbitrarily large, we have 

E / E c E x [P e ($)] 



< P UX y((u,x) i T b WAK ( 7 b) U (u,y) ^ r c WAK (7c)) 

+ lmi £ Pc/(u) + 2\/ — iZi — + 5 - 



(278) 



Consequently, there exists at least one code (k,f,C) such that P C ( < 1 ) ) is smaller than the right-hand-side of the 
inequality above. This completes the proof of Theorem [14] 



Appendix E 

Proof of the Second Non-Asymptotic Bound for WAK in Theorem [T6l 
To prove Theorem [T6l we modify the proof of Theorem [14] as follows. 

First, we use J = {1, . . . , J} instead of C in the construction of <pc, where J is the given integer. 

Then, the helper and the decoder are modified as follows. The helper first uses the stochastic map tpc '■ K-xy — » J . 
That is, it generates j G J according to ipc( ■ \k, y) when the side information is y G y and the common randomness 
is k G K. Then, the helper sends j by using random bin coding k: J —> C. This means that to every j G J, it 
independently and uniformly assigns a random index / G C For given m G M, I G C, and the common randomness 
k G K,, the decoder outputs the unique x G X such that f(x) = m and 

(n fcj ,x)G7; WAK (7b) (279) 

for some j G J satisfying k(J) = I. If no such unique x exists, or if there is more than one such x, then a decoding 
error is declared. 
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The analysis of the probability of the decoding error is modified as follows: Let J be the random index chosen 
by the helper via the stochastic map tpc{ • \K, Y), and let U = u R j. The joint distribution of K, J, U, Y is 



P KjuY( k ^,u,y) = Tj^rPY{y)^c(j\k,y)l[u kj = u}. 



(280) 



The other measures P K j(j XY , ^kjuy' anc ^ P kjuxy w& given similarly. On the other hand, if the decoding error 
occurs, at least one of the following occurs: 

£i:={(u kj ,x)<£T^ AK (j h )} (281) 

£ 2 := {3x^x s.t. f{x) = f{x), (u kj ,x) G 7; WAK ( 7b )} (282) 

£ 3 :={3i^ x,j ? j s.t. fix) = f(x),K(J) = k(J), {u kj ,x) G 7; WAK ( 7b )} • (283) 

Hence, the error probability averaged over random coding /, k, the random codebook C, and the common randomness 
K can be bounded as 

E /jK E c E^[P c ($)] < E /jK E c [P K ju XY {£i U S 2 U S 3 )] (284) 
< E f E c [P K ju XY (£i U S 2 )) + E /iK E c [P K ju XY (£3)) ■ (285) 
The first term in (|2851 l is upper bounded in the same way as bounding (12581 ), and we have 

E / E c [Pkjuxy^U^)] < Puxy((u,x) i 7; WAK ( 7b ) U (u,y) t 7; WAK ( 7 c)) 

1 



+ \M\ E Pu(u) + 2 
1 1 («,£)e7; WAK (7b) 



1 Ai lc ,P UY ) 



\J\ 



+ 6. 



(286) 



On the other hand, the second term in (12851 is upper bounded as follows. 



X] P KJUXY^,h u ' x ' y) 1 ^ xj^xj^j sX - /(*) = /( x )> = «0')i > G 7b WAK (7b)] 

k,l,u,x,y 



< E /iK E c 



E/, K E C 



k,j,u,x,y 



x^x 
3+3 



E P K j XY (k,j,x,y)^l[fix) = fix),Kij) = kU)] ■ l[(n fc ,,x)G7; WAK (7b)] 



k,j,x,y 



x=£x 
3+3 



< 



1 



\M\\C 
1 

1 

\J\ 

\M\\C\ 

\J\ 

\M\\C\ 



■Er 



E P K J XY (kJ,x,y)^l[iu k] ,x) GT^ AK i 7h )] 

k,j,x,y 



x.j 



■E C 



P K (k) £ l[(u k] ,x) G 7; WAK (7b)] 

Ei^E^Ki-^G^CTb)] 



^p f/ («)i[(«,f)G7; WAK (7b)]} 

E 

(«,x)e7; WAK (7b) 



(287) 
(288) 

(289) 

(290) 

(291) 

(292) 
(293) 
(294) 
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where (12931 ) follows from the fact that C is generated according to Pjj. Substituting (12861 ) and (12941 ) into (12851 ), we 
have Theorem [16] 



Appendix F 

Proof of the Non-Asymptotic Bound for WZ in Theorem [HI 
A. Code Construction 

Similar to WAK coding in the previous two sections, we use the stochastic map introduced in Appendix [C] In 
WZ coding, let K, be the alphabet of the common randomness for the encoder and the decoder, and let Z = X and 
P uz = Pux- Note that Tdjc) defined in ( 12031 ) is equivalent to 7^ WZ (7 C ) defined in d54]). Now, let us consider the 
stochastic map ipc defined in (1230b . 

By using (pc, we construct a WZ code $ as follows. The encoder first uses the stochastic map tpc : K, x X — > C. 
That is, it generates I G C according to ipc(-\k, y) when the source output is x G X and the common randomness 
is k G K,. Then, the encoder sends I by using random bin coding k: C — > M.. This means that to every / G C, 
it independently and uniformly assigns a random index m G A4. For given m G A4, y G y, and the common 
randomness k G /C, the decoder finds the unique index / G C such that k(1) = m and 

(u k i,y)eT^ z ( lp ). (295) 

Then, decoder outputs g(uki,y) G Af. If no unique / satisfying ( 12951 ) exists, or if there is more than one such / 
satisfying ( 12951 ), then a decoding error is declared. 



B. Analysis of Probability of Excess Distortion 

Let L be the random index chosen by the encoder via the stochastic map ipc( ■ \K, X), and let U = u K ^. Note 
that the joint distribution of K,L,U,X is given as follows; cf. (12311 ) 



P KLUx( k > l i u > x ) = T^T p x{x)ip c (l\k,x)l[u k i = u] (296) 



and then, the joint distribution of K, L, U, X and Y is given as 

P KLUXY( k ' l i U i X iy) = P KLUx( k ' l ' U ' X ) P Y\x(y\x)- (297) 

The smoothed versions P R ^ X and p k Kj X y are §i ven by substituting P\ in ( 12961 ) with Px\ cf. (1233b . 
If the distortion exceeds D, at least one of the following events occurs: 

£ :={(u kl ,x,y)tT^ z (D)} (298) 
£i:={(^,y)^T p WZ ( 7p )} (299) 
E 2 := / s.t. K (l) = k(Z), (u kI ,y) G 7; WZ ( 7 p)} • (300) 

Hence, the probability of excess distortion averaged over the random coding k, the random codebook C, and the 
common randomness K can be bounded as 

E K E C E A -[P C ($; D)] < E K E C [ p kW xy( £ o u £ i u &)] (301) 
< E C [P xMxy (£b U £i)] + E K E C [P xMxy (£ 2 )] ■ (302) 
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At first, we evaluate the first term in (1302) . For fixed C, 



KLUXY( £ US h 



E P K LUx(k,l,u,x)P Ylx (y\x)l[(u,x,y) $ T^ Z (D) U (u,y) i T^ z { lp )] 

k,l,u,x,y 

J2 P (jx (n,x)P Ylx (y\x)l[(u,x,y) $ T^ Z (D) U (u,y) $ T p ^ Z { lp )] 



u,x,y 



P^ XY ((u,x,y) i Tf z {D) U (u,y) i T p w *( 7p )) 



wz, 



(303) 
(304) 
(305) 



< P UXY ((u,x,y) $ T^ Z {D) U (u,y) $ T p wz ( lp )) + 1 P vxy{U x X x y) + p uxy) (306) 

= W((u,x,y) i T d WZ p) U («,y) T p WZ (7p)) + P VXY^x)iT™{ lc )) + W) ^ 



= Pc/xy((M,x,y) 7d (-D) U (u, y) T p (7 P )) + ^ 

+ d(P (jx P Y \ Xl P ux P Y \ x ) 
< P UXY ((u,x,y) i rr(D) U (u,y) $ T^\ lv) ) + *T™(>y c )) + d{P(j ^p ux) 



< P UXY ((u,x,y) i T^ Z {D) U (u,y) £ T p w % p )) + P ux ((u, x) $ T c w % c )) + d{P tx ,P ux ) 



-wz 



wz. 



Puxy((u,x) i 7; WZ ( 7c ) U (u,x,y) $ T d WA (D) U £ T p w % p )) + d(P &x ,P ux ) 



-wz 



wz. 



(308) 

(309) 

(310) 
(311) 



where (13061 ) follows from (I1831 l. (1309 1 follows from the data-processing inequality (I1821 i. (131 Oi l follows from 
Lemma |35l and d3 lib follows from the fact that Pjj X y is the smoothed version of P\j XY with respect to 7^ WZ (7 C )- 
By taking the average over C, we see that the second term in (131 It is upper bounded as 



< 



1 A( lc ,P ux ) , 1 



+ 



2T 



2 V |/C||£ 



+ iV[-IogJV(«) >7] 



(312) 



where we used Lemma [36] 

Next, we upper bound the second term in (1302 l i: 

E«Ec [Pkluxy^)] 
= E K E C 



<E K E C 



E K E C 



E p jc£ETxy(MiUia,y)l[3Zy J s.t *s(f) = K(l),(ug,y) eT p wz (j P )} 

k,l,u,x,y 



E ^xy(M,«,x,y)^l[K([) = K (/)] • l[K F ,y)G77 z (7 P )] 

k,l,u,x,y i^i 



i[Kpy)e7; wz (7 P 



< 



i 



|M| 
1 

i 



Er 



E r 



Er 



E ^xyCM^E^W) e 7p WZ (7 P )] 



k,l,x,y 



E P K Lx( k M p y 



k,l,x,y 



E]B iV ^E 1 [Kf^)e7; wz (7p)] 



(313) 
(314) 
(315) 
(316) 
(317) 
(318) 
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= T^tE J V(v) J ^(«) 1 K«^) e7 ^ Z (7p)] (319) 
= 7^7 E ^V(V)^(«) (320) 

where (1314b follows from the union bound and (|319t follows from the fact that C is generated according to P\j. 
By uniting (I3TTT) . (l3~T2l . and (I320T ) with (13021 ). we have 

E K E C E X [P C ($;D)] 

< P W y((«,x) i 7; WZ ( 7 c) U (u,*,y) £ T™(D) U (u,y) ^ T^p)) 



\C\ 



E Py(y)^(n) + l\ Ailc in Ux) + l\hMn +Pu [" lo g^) > i\ ■ 021) 



\M\ ^ ,w uw 2V |£| 2\ l/CILCI 

' ' («,S/)6T« z (7p) V 11 y 1 11 1 

Since we can choose 7 > and |/C| arbitrarily large, we have 

E K E C E X [P C ($;D)] 

< P UX y((u,x) i 7; wz ( 7c ) U (u,x,y) $ Tf\D) U (u,y) $ T^ z (l P )) 

+ f§ E iV(y)fl/(«) + i*/^^i + & (322) 

Consequently, there exists at least one (k,K,C) such that P e (<J>;Z)) is smaller than the right-hand-side of the 
inequality above. This completes the proof of Theorem [TT] 



Appendix G 

Proof of the Non-Asymptotic Bound for GP in Theorem [T9l 

A. Code Construction 

As in WAK, we use the stochastic map introduced in Appendix [Cj In GP coding, let K, be the alphabet of the 
common randomness for the encoder and the decoder, and let Z = S and Pjjz = Pus- Note that that, T c {^ c ) 
defined in (12031 ) is equivalent to 7^ GP (7 C ) defined in d66l ) in this case. 

For GP coding, we construct \M\ stochastic maps. Each stochastic map corresponds to a message in M.. For 
each message m G M., generate a codebook = {u\^\ . . . j^i/qj^} where each uff is independently drawn 
according to Pjj. Then, for each C( m ) (m G M), construct a stochastic map (p^ m ) as defined in (1230b - 

By using {v?c< m > }mex> we construct a GP code <I> as follows. Given the message m G M., the channel state 
sG5, and the common randomness k G JC, the encoder first generates I G C according to </?c( m > ( ' \k, s). Then, the 
encoder generates x G X according to Px\us( ' \ u ja 1 s ) an ^ m P uts x mto tne channel. If the randomly generated x 
results in g(x) > F (i.e., the channel input does not satisfy the cost constraint), declare an cost-constraint violation 
errorj^l Given the channel output y G y and the common randomness k G IC, the decoder finds the unique index 
rh G M such that 

(4f ) ,y)Gr p GP (7 P ) (323) 

for some I G C If there is no unique index rh G M. or more than one, declare a decoding error. This is a Feinstein- 
like decoder [7 1 for average probability of error. If no such unique rh exists, or if there exists more than one such 
rh, then a decoding error is declared. 

7 Even if g(x) > T occurs, we still send x through the channel. The error event for this occurrence is accounted for in d326t . 
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B. Analysis of Error Probability 

Without loss of generality, and by symmetry, we may assume that M = 1 is the message sent. Let L be the 
random index chosen by the helper via the stochastic map tpcm ( • \K, S), and let U = u K t be the chosen codeword. 
Note that the joint distribution of K, L, U, S is given as follows; cf. (12311 ) 

P KLUs( k ' l i U i S ) = nq jP s( s )V'Cw(^jS)l[lt^ ) = u] (324) 

and then, the joint distribution of K, L,U , S, X,Y is given as 

P KLUsxY( k i l ' u i s ' x 'y) = P KLUs( k ' l i u ' s ) p x\us{x\ u ,s)W(y\x,s). (325) 

The smoothed versions P K i(j S and PjcWsxy are §i ven by substituting Pg in (13241 ) with Pg; cf. (12331 ). 

If either a cost-constraint violation or a decoding error occurs, at least one of the following error events must 
occur: 

£o:={xiT^(T)} (326) 

£i:={(4!^)^p GP (7p)} (327) 

£ 2 := {3 rh ^ 1 s.t. {u^\y) G T p GP (7 P ) for some I e £.\ . (328) 

Hence, the error probability averaged over the all random codebook C := {C^} m &M an d the common randomness 
K can be bounded as 

E c E^[P e ($; T)] < E c [P K iu SXY (£o U fi U £ 2 )] (329) 

< E C(1) [P^ M5xy (fib U £1)] + E C [P Kt &sxy&)] ■ ( 33 °) 
At first, we evaluate the first term in (13301 . For a fixed codebook C, 

P KLUSXY^ 1 ) 

= £ P^ Ms (A:,/, U , S )P X |^(x| U , S )W(y|x, S )l[x^r g GP (r)U(n,y) ^T p GP ( 7 p)] (331) 

= E i^ s («,a)i'x[Ps(ick*)W(y|a Jj8 )l[x £ T g GP (r) U (u,y) $ T p GP ( 7 p)] (332) 

= P usxy( x i 7"g GP ( r ) u £ 7"p GP (7p)) (333) 

< JW(* T GP (r) U (u,y) * 7^ p ( 7 p)) + 1 ~ PLfSXy( ^ 2 XtSX<Y><:);) + d(P &SXY ,P USXY ) (334) 

= 9! r GP (r) u (u, y ) i t; gp ( 7p )) + P ^^ U ^ T ^^ + d(p^,iW) (335) 
= W(x 9! r GP (r) u (« f y) * t; gp ( 7p )) + W((«,^^( 7 ,)) 

+ d(P 0s P x{us W, P us P x \usW) (336) 

< Pusxy(x i T GP (r) U (u,y) i 7^ P ( 7P )) + P ^(^^ r c GP (7c)) + d{P ^p us) eg?) 

< Pc/sxy (x T GP (r) U (u, y) £ 7; GP ( 7p )) + s) $ T c GP ( 7c )) + d(P^ s , Pj/s) (338) 
= Pf/sxy(x £ T GP (r) U (u,y) $ 7^ p ( 7p ) D (u,s) e T GP ( 7 c)) + Pt/sxWM) T GP ( 7 c)) 

+ d(P &s ,P[/s) (339) 

= Pf/sxy (x g T GP (r) U {u, y) £ 7^ p ( 7p ) U («, a) £ T GP ( 7 c)) + ^(P^, JVs) (340) 

where (13341 ) follows from (11831 ). (13371 ) follows from the data-processing inequality (11821 ). and (13381 ) follows from 
Lemma [35] By taking average over CW, the second term in (13401 ) is upper bounded 



Ec W [d(P &g ,P^)] < lJ A( y g) + lj^ + JV[-togiV(«) >7] (341) 
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by using Lemma [ 

Next, we evaluate the second term in (|3301 > - 

[Pklusxy^)] 
= Ec 



y)l[3m # lj G £ s.t. (uj^y) G T p GP ( 7p )] 



<E C 



< Er 



E PrLUSXY 
k,l,u,s,x,y 



(M,t*,s,s,v)£l[(u£f ,y)e7- p GP ( 7p )] 



£ P Ktusx( k > h «, s, * a) £ l[(u<™, y) G T GP ( 7p )] 

k,l,u,s,x,y m,! 



ml 



\M\\L\ £ Py(y)Pu(u)l[(u, y) G r p GP ( 7p )] 



= |M||£| E ^v(y)^W 

(«,y)er p GP( 7p ) 

where (13461 ) follows from the fact that C is generated according to Pjy. 
By combining (13401) . (I34TI and (f34Tb with (f330b . we have 

E c E^[p e ($ ; r)] 

< itoxyO* £ T GP (r) U («,y) £ if* {ft) U (u,s) T GP ( 7c )) + E Py{v)Pu{u) 

(«,v)67^p(7p) 



(342) 
(343) 

(344) 

(345) 

(346) 
(347) 



1 / A( 7c ,ifr s ) 1 2T_ 

+ 2V jzj + 2 + Pc/ [- lo g p ^w > ?i 

Since we can choose 7 > and \K\ arbitrarily large, we have 

E c E x [p c ($ ; r)] 

< Pusxy(x i T GP (r) U (u,y) i T GP ( 7p ) U (u,s) i 7T GP ( 7c )) + \M\\C\ £ iV(l/)iV( 

M6r p GP ( 7 p) 



(348) 



1 A( 7c ,P^) , 



(349) 



Consequently, there exists at least one realization of the random code (k,K,C) such that P (<P;r), defined in (1251 ). 
is no larger than the right-hand-side of inequality (1349) . This completes the proof of Theorem [19] 

Appendix H 

Preliminaries for Proofs of the Second-Order Coding Rate 

In this appendix, we provide some technical results that will be used in Appendices HI [Kl and [Ml More specifically, 
we will use the following multidimensional Berry-Esseen theorem and its corollary. 

Theorem 37 (Goetze [23]). Let Ui, . . . ,U n be independent random vectors in M. k with zero mean. Let S n = 
77= (Ui + • • • + U n ), Cov(S n ) = I, and £ = - Y^l=i IE [ 1 1 U-i 1 1 §1- Let the standard Gaussian random vector Z ~ 
A/"(0, 1). Then, for all n G N, we have 



sup |Pr{S n G ^} - Pr{Z G V}\ < 



■n 



(350) 
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where is the family of all convex, Borel measurable subsets of M. k . 

It should be noted that Theorem [37] can be applied for random vectors that are independent but not necessarily 
identical. For i.i.d. random vectors, Bentkus [52 1 proved that the dependency of the bound on the dimension can 
be improved from \/k to d l l A . 

We will frequently encounter random vectors with non-identity covariance matrices. Thus, we slightly modify 
Theorem [37] in a similar manner as ||25l Corollary 7] as follows . 



Corollary 38. Let Ui , . . . , TJ n be independent random vectors in M fc with zero mean. Let S n = -4= (Ui + • • • + U n ), 
Cov(S n ) = V >- 0, and £ = - Ya=1 XJ* ] [ 2]- Let the Gaussian random vector Z ~ Af(0, V). Then, for all n G N, 

sup |Pr{S n G ^} - Pr{Z G <*f} < _1 q (351) 

%T6C fc A m i n (V) d /V n 

where (£& is the family of all convex, Borel measurable subsets ofM. k , and where A m i n (V) is the smallest eigenvalue 
"f V. 

Appendix I 

ACHIEVABILITY PROOF OF THE SECOND-ORDER CODING RATE FOR WAK IN THEOREM [24] 

Proof: It suffices to show the inclusion &i a (n, e; Putxy) C ^wak^,^) f° r fixed Putxy G ^(Pxy)- 
We first consider the case such that V = V(Putxy) >~ 0. First, note that R G &\ n (n, e; Putxy) implies 

z := y/n (~R - J - ^^1 2 ^) e ^(V, e). (352) 



n 

We fix a time-sharing sequence t n G T n with type P t « G ^ n (T) such that 

|P t «(i)-P T (i)l < - (353) 
n 

for every t G T B31 . Then, we consider the test channel given by P Un \y n (u n \y n ) = P^ TY (u n \t n ,y n ), and we use 
Corollary Q3] for = P%y p U"\Y« bv setting 7 b = log|X n | - logre, j c = log|£ n | - logn, and S = ~. 

Then, there exists a WAK code <E> n such that 

1-P e (*n) >PThj£j(Ui,Xi,Yi\ti) <nR-lognl 2 | (354) 

= Pr I 4= E ^ " J ) < z + 1 - - - 055) 

I v n v n J n V n 

By using Corollary [38] to the first term of (13551 . we have 

l-P e (*n)>Pr|z<g + ^l 2 j-Or^ (356) 



Pr{Z < z} + O ( ^ ) (357) 



>l-e (358) 

for sufficiently large n, where (13571 ) follows from the Taylor's approximation, and (13581 ) follows from (I3521 ). 

Next, we consider the case with V is singular but not 0. In this case, we cannot apply Corollary [38] because 
Amin(V) = 0. Since rank(V) = 1, we can write V = vv T by using the vector v. Let Aj = j(Ui, Xi,Yi\ti) — J. 
Then we can write Aj = vi?j by using the scalar independent random variables {-Bj}" =1 . Thus, by using the 
ordinary Berry-Esseen theorem ||63] Ch. XVI] for {P>i}f =1 , we can derive (135 8b - 

Finally, we consider the case where V = 0. In this case, by setting z = in ( 13551 ), we can find that the right 
hand side converges to 1. ■ 



47 



Appendix J 

ACHIEVABILITY PROOF OF THE SECOND-ORDER CODING RATE FOR WAK IN THEOREM [25] 

Proof: We only provide a sketch of the proof because most of the steps are the same as Appendix U The 
only modification is that we use Theorem [T6l instead of Corollary [15] by setting 7b = log \M. n \ — Py/n — logn, 
7 C = log \C n \ + p^/n - logn, J n = \C n \2 p ^, and 8 = ±. ■ 



Appendix K 

ACHIEVABILITY PROOF OF THE SECOND-ORDER CODING RATE FOR WZ IN THEOREM 

Proof: It suffices to show the inclusion & m (n, e; Pjjtxy , g) C ^Vz(ra,e) for fixed (PuTXY,g) £ &{Pxy)- 
We assume that V = V(Putxy , g) >~ 0, since the case where V is singular can be handled in a similar manner 
as Appendix U (see also |25] Proof of Theorem 5]). 
First, note that [R, D] T G &i n (n, e; Putxy , g) implies 



z : = 



n 



D 



2 log n 



n 



i 3 ey(V,e) 



(359) 



for some positive integer L n . We fix a sequence t n G T n satisfying (13531) for every t G T. Then, we consider the 
test channel given by P[/»|X" (n n |a; n ) = P^| TX (n n |t™, x 11 ), and we use Corollary [T8l for P^j^y™ = PxyPrj»|v™ 
by setting 7 p = log r^-r + logn, 7 C = logL n — logn, and 8 = ~. Then, there exists a WZ code such that 



l-P c ($ n ;D) >Pr{ 



i=l 
f i " 

I v n i= i 



1Q g Ijfel 




log 


- lognl 3 | 


nD 





_ logn 
< z H —3-3 



! 


2 






n 




2 




/I 


n 


"V 


n 



(360) 
(361) 



Now the rest of the proof proceeds by using the multidimensional Berry-Esseen theorem as in (|3561 l to (|3581 l for 
the WAK problem. ■ 



Appendix L 

ACHIEVABILITY PROOF OF THE SECOND-ORDER CODING RATE FOR LOSSY SOURCE CODING IN THEOREM 



We slightly modify a special case of Corollary [18] as follows, which will be used in both Appendices IL-AI and 
EE 

Corollary 39. For arbitrary distribution G &{X), and for arbitrary constants 7 c ,i / > and 5,5 > 0, there 
exists a lossy source code <I> with probability of excess distortion satisfying 



P e ($;D)<P 



xx 



log 7T\ > 7c - ^ or d(z, x) > D 



+ 5 + 



I 2Tc 



+ 5 + 2- 



(362) 



Proof: As a special case of Corollary [18] we have 



P e (*;£>)<P 



XX 



log- 



> 7 C or d(x, x) > D 



+ 5 + 



/ 27c 



+ <5, 



(363) 
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where we set 7 P = and L = 5\M.\. We can further upper bound the first term of (1363) as 



P 



XX 



log 



p x ( x ) 



> 7c or d(x, x ) > D 



P 



XX 



X 

lo § A 77^ 1" log 



< P 



XX 



< p 



XX 



p 



XX 



< p 



XX 



log 
log 
log 
log 



Qx 


(x) 


p x\x 


(x\x) 


Qx 


(x) 


p x\x 


(x\x) 


Qx 


(x) 


p x\x 


(x\x) 


Qx 


(x) 


p x\x 


(x\x) 



Qv( x ) , 

^-{> lc ord{x,x)>D 
P x \ x ) 



Q x (x) 



+ P 



XX 



\ Qx(x) 
log ■ 



> 7 C — v or log — x . > v or d(x, x) > D 

p x\ x ) 

> 7c — v or d(x, x) > D 

> 7c — v or d(x, x) > D 

> 7c — v or d(x, x) > D 



P x ( x ) 



> v 



+ P 



x 



i Qx( x ) ^ 



+ 2~ u . 



(364) 
(365) 
(366) 
(367) 
(368) 
(369) 



This completes the proof. ■ 
Remark 6. By showing Corollary \39\ directly instead of via Corollary \T8\ we can eliminate the residual term 5. 

A. Proof Based on the Method of Types 

To prove Theorem |29l by the method of types, we use the following lemma. 

Lemma 40 (Rate-Redundancy (271). Suppose that R{Px,D) is differ entiahle w.r.t. D and twice differ entiahle 
w.r.t. Px at some neighbourhood of (Px,D)- Let e be given probability and let AR be any quantity chosen such 
that 



PI [R(P x n,D) - R(P X ,D) >AR] = e + g n 



where g n = O [^^J- Then, as n grows, 



Afi /ttrfa-(*,fl)) , flog; 
V n \ n 



(370) 



(371) 



Note that the quantity j(x,D) has an alternative representation as the derivative of Q t- > R(Q,D) with respect 
to Q(x) evaluated at Px(x); cf. (|152j 

We also use the following lemma, which is a consequence of the argument right after ||64] Theorem 1]. 



8R(q,D) 
8D 



Lemma 41. For a type q £ £? n (X), suppose that 
q. Then, there exists a test channel V G ^(3^; q) such that 

q(x)V(x\x)d(x, x) < D 



< C for a constant C > in some neighbourhood of 

(372) 



and 



I(q,V)<R(q,D) + 



n 



(373) 



where r is a constant depending on C, \X\, \X\, and -D m ax- 

Using Lemmas [40] and [4TJ we prove Theorem |29l 

Proof: We construct a test channel P X n\ X n as f ouows - F° r a fixed constant f > 0, we set 

n n =\qe^ n (X):\\P x -q\\ 2 <^^ 

n 



(374) 
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Since we assumed that R(Px, D) is differentiable w.r.t. D at Px, the derivative is bounded over any small enough 
neighbourhood of Px- In particular, it is bounded by some constant C over Q n for sufficiently large n. For each 
q G O n , we choose test channel V q G %(3^;<?) satisfying the statement of Lemma |4T1 Then, we define the test 
channel 



x n \x- 



,{x n \x n ) 



1 

\Tv P n (x n 





if x n G Ty^„ (x n ) 
else 



(375) 



for x n satisfying P x n G ri n , and otherwise we define P Xn , Xn (x n \x n ) arbitrarily as long as the channel only outputs 
x n satisfying d n (x n ,x n ) < D. Let P q G &> n (X) be such that 

P q {x) =Y,^)V q {^)- (376) 



Then, let P™ G &{X n ) be the uniform distribution on Tp q . Furthermore, let Q Xn G &{X n ) be the distribution 
given by 



Qx^ n )=Ew~A 



We now use Corollary 1391 for Px — Px' Pjc\x — Px n \X n ' an< ^ Qx — Qx n ~ Then, by noting that 

d n (x n ,x n ) = 5^P x »(x)Vft B (s|x)d(x,x) > £> 



never occurs for the test channel P X n\ X n, we have 



P e (<f> n ;D) < P XnXr 



P xn]xn {x n \x n ) 
lo g Fi — 7^T\ > 7c - v 



1 P xn{xn (x"\x") 

~ log — — — > 7 



+ 5 + 
log n 



27c 



+ 5 + 2^ 



+ 



n27™ 3 
+ -, 



\M n \ n 



where we set j c = jn, 5 = 5 = -, and v = logn. Furthermore, by noting that 

1 

I 

for any g E f2 n , we have 



Qx„(£ n ) > ^-j-^(s n ) 



X n X r. 



< P 



1 , ^x«|X"(^ n l xn ) _ logn 
- lQ g 7^ 7^ > 7 



< p 



X n X r. 



< P 



X n X r. 



1 






lo 


n 




1 






lo 


n 




1 






lo 


n 





X n \X r - 



n 







P X n\ X « 


(x n \x n ) 


Qx, 


Xx n ) 


Px^\X" 


(x n \x n ) 



logn 

> 7 — — ,Px* g n r 



>1 



n 

log n 
n 



> -fx™ G *\lri 



2f 

+ — 

n z 



logn | ATI log(n + 1) 
> 7 , "a;" 6 S2„ 



+ 



2f 



(377) 



(378) 



(379) 
(380) 

(381) 

(382) 
(383) 
(384) 
(385) 

/>;;. (.;•») 

where (13841 follows from [27 Lemma 2] and (13851 follows from (I38TT ) and the fact that |fi n | < |^„(A')| < 
(n + 

Furthermore, we also have 



log' 



PpA xn ) 



log 



|7> P 



n/(P^,y P ^) + 0(logn). 



(386) 
(387) 
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Thus, for fi n = O f 12 ^), we have 



P e ($ n ; D) < P™ [I(P x n , V P , n ) > 7 - fi n , P x n G O n ] + O ( - ) + 



< P 



X n X r - 



R(P x n , D) > 7 - /x n - -, iV g ft r 

n 



, 1\ /n27« 



< P 



X n X r - 



R(P xn ,D)>J-Vr. 



n 



< P X r 



R(P x n,D)>i- fin -- 

n 



(388) 
(389) 
(390) 
(391) 



Thus, by setting 7 = R(P X ,D) + AR, ±]og\M n \ = 7 + and by using Lemma M (with g n = O 

being the residual terms in (I3911 l). we have 



P(n, £ ;£) < P(P X ,D) + ./ V ^^^)) Q -i (£)+0 ^ogn 



n 



n 



(392) 



for sufficiently large re, which implies the statement of the theorem. 



B. Proof Based on the D -tilted Information 
Let 



B D (x n ) := {x n :d n (x n ,x n )<D} 
be the D-sphere, and let P^, be the output distribution of the optimal test channel of 



(393) 



min I(X;X). 



x\x 

(X,X)]<D 



To prove Theorem [29] by the D-tilted information, we use the following lemma. 



Lemma 42 (Lemma 2 of M28I1 ). Under some regularity conditions, which are explicitly given in 
and satisfied by discrete memoryless sources, there exists constants riQ,c,K > such that 



^x 



log 



1 n 

^—<J2j(x i ,D) + Clogn + c 



for all n > uq, where C > is a constant given by K28\ Equation (86)]. 
Proof: We construct test channel P X n\x n as 



> 1 



K 



P^„| X „(x n |x n ) 



else 



(394) 



Lemma 2] 



(395) 



(396) 



We now use Corollary |39]for Px = PjJ, P±\x = ^ > x n \x n ' Qx = ^x*' ^ c = ^ = ^ = n anc ^ v = l°S n - Then, 
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by noting that d„(x n ,x n ) > D never occur for the test channel P Xn ^ x „, we have 
P e ($ n ;D) < P x , lX 



P^ xn {x n \x n ) 
lo S rm > in - log re 



PI (x n ) 



n2r n 3 
+ - 



\M n \ n 



< p 



x*x* 



log 



P^(B D (x n )) 



> 7n — log n 



+ 



n27« 3 
\Mn\ n 



P 



X 



< PI 



log 



P^(S D (x n )) 



> 772 — log re 



+ 



rW n 3 
\M n \ n 



+P 



< P r i 



j(xi,D) > -yre — (C + 1) log n — c 

=1 

1 ™ 

kg pnjBM^r) > z>* > -°) + c ^ » + 

j(iCi, D) > 7n — (C + 1) log re — c 



+ 



n2T n 3 
\M n \ n 



i=l 



K / rW n 3 



+ -, 



n V |.M n | n 



(397) 
(398) 
(399) 
(400) 
(401) 
(402) 



where (|4Q2t follows from Lemma 1421 Thus, by setting 7 = ^ log \M. n \ — 21 ° gn and by applying the Berry-Esseen 
theorem [63], we have (|392| i for sufficiently large re, which implies the statement of the theorem. ■ 



Appendix M 

ACHIEVABILITY PROOF OF THE SECOND-ORDER CODING RATE FOR GP IN THEOREM 

Proof: It suffices to show the inclusion ^i n (re, e, Ptusxy) C &Gp{n,e) for a fixed Ptusxy £ &(W,Pg). 
We assume that V — V {Ptusxy i9) 0> since the case where V is singular can be handled in a similar manner 
as Appendix |T] (see also j25l Proof of Theorem 5]). 

First, note that [R, F] T G ^ n (n, £, Ptusxy) implies that 



z : = 



71 



log|A^ n |L n 
— log L n 



J + ^l 3 



re 



(403) 



for some positive integer L n . We fix a sequence t n G 7~ n satisfying (13531 ) for every t £ T. Then, we consider 
the test channel and the input distribution given by Punx n \S n (,'u n , x n \s n ) = P^ x ^ TS (u n , x n \t n , s n ), and we use 
Corollary |20] for P UnX ™S n Y n = Ps p U"X"\S"W n by setting 7 P = log \M n \L n + logn, 7 C = logL„ - log re, and 
S = -. Then, there exists a GP code <3? n whose average probability of error P e (<l> n ;r) satisfies 



1-P ($ n ;r) >Pr Xi,Yi\ti) > 



log|.M n |L. 

- log L n 

-rer 
— J) > z - 




2 

re 



(404) 



(405) 



Now the rest of the proof proceeds by using the multidimensional Berry-Esseen theorem as in (T356t to (T358t for 
the WAK problem. ■ 
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